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Abstract 

This  thesis  describes  a  methodology,  a  representation,  and  an  imple¬ 
mented  program  for  troubleshooting  digital  circuit  boards  at  roughly  the 
level  of  expertise  one  might  expect  in  a  human  novice.  Existing  methods 
for  model-based  troubleshooting  have  not  scaled  up  to  deal  with  complex 
circuits,  in  part  because  traditional  circuit  models  do  not  explicitly  repre¬ 
sent  aspects  of  the  device  that  troubleshooters  would  consider  important. 
For  complex  devices  the  model  of  the  target  device  should  be  constructed 
with  the  goal  of  troubleshooting  explicitly  in  mind.  Given  that  methodology, 
the  principal  contributions  of  the  thesis  are  ways  of  representing  complex 
circuits  to  help  make  troubleshooting  feasible.  Temporally  coarse  behavior 
descriptions  are  a  particularly  powerful  simplification.  Instantiating  this  idea 
for  the  circuit  domain  produces  a  vocabulary  for  describing  digital  signals. 
The  vocabulary  has  a  level  of  temporal  detail  sufficient  to  make  useful  pre¬ 
dictions  about  the  response  of  the  circuit  while  it  remains  coarse  enough 
to  make  those  predictions  computationally  tractable.  Other  contributions 
are  principles  for  using  these  representations.  Although  not  embodied  in  a 
program,  these  principles  are  sufficiently  concrete  that  models  can  be  con¬ 
structed  manually  from  existing  circuit  descriptions  such  as  schematics,  part 
specifications,  and  state  diagrams.  One  such  principle  is  that  if  there  are 
components  with  particularly  likely  failure  modes  or  failure  modes  in  which 
their  behavior  is  drastically  simplified,  this  knowledge  should  be  incorporated 
into  the  model.  Further  contributions  include  the  solution  of  technical  prob¬ 
lems  resulting  from  the  use  of  explicit  temporal  representations  and  design 
descriptions  with  tangled  hierarchies. 
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Chapter  1 
Introduction 


A  field  engineer  plugs  in  a  broken  circuit  board ,  makes  a  half 
dozen  simple  probes  with  an  oscilloscope,  and  after  ten  minutes 
ends  up  swapping  a  chip,  which  fixes  the  problem. 

A  model-based  troubleshooting  program  spends  a  day  simulat¬ 
ing  the  expected  behavior  of  the  same  misbehaving  board,  and  re¬ 
quests  that  a  logic  analyzer  be  used  to  capture  a  certain  subset  of 
the  signals.  After  some  hours  of  computation  it  concludes  that 
any  of  the  40  chips  or  400  wires  on  the  board  could  be  responsible 
for  the  misbehavior. 

What  does  the  field  engineer  know  that  the  program  does  not?  How  can 
a  model-based  troubleshooting  program  represent  and  use  that  knowledge? 
Both  the  program  and  the  field  engineer  have  the  circuit  schematic  and  the 
specifications  of  the  individual  chips.  The  field  engineer  additionally  has 
expectations  about  the  design  of  the  circuit,  expectations  about  which  signals 
in  the  circuit  ought  to  be  changing  and  how  fast,  and  expectations  about  the 
kinds  of  failures  that  are  likely  to  occur  in  digital  circuits.  Incorporating  this 
knowledge  into  the  circuit  model  makes  it  possible  to  be  more  discriminating 
in  the  generation  of  diagnoses  and  more  efficient  in  the  use  of  observations. 

1.1  Model-Based  Troubleshooting 

Model-based  troubleshooting  is  driven  by  the  interaction  of  observation  and 
predictions  (Figure  1.1).  A  device  model  produces  predictions  about  what 
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ought  to  be  observed;  comparison  with  observations  of  the  actual  device 
produce  discrepancies;  these  discrepancies  are  then  traced  to  their  possible 
underlying  causes  in  the  model  and  repairs  of  the  actual  device  proposed. 


Figure  1.1:  Model-Based  Troubleshooting 


ACTUAL 

DEVICE 


MODEL 


This  report  describes  a  model-based  troubleshooting  program.  Its  pri¬ 
mary  input  is  a  model  of  a  digital  circuit  that  is  a  network  of  components 
and  connections.  Each  component  has  a  description  of  its  dynamic  time- 
dependent  behavior  and  each  connection  transmits  signals  between  compo¬ 
nents.  The  secondary  input  to  the  program  is  a  description  of  the  stimuli 
presented  to  the  circuit  and  observations  of  its  actual  responses.  The  model 
uses  those  stimuli  to  predict  what  the  outcomes  of  observations  ought  to  be. 
When  discrepancies  are  discovered,  the  program  produces  lists  of  components 
that  could  be  responsible  for  the  discrepancies,  ranked  by  their  relative  likeli¬ 
hood.  The  program  interactively  suggests  what  observations  should  be  made 
next  to  discriminate  among  these  possibilities,  then  uses  any  new  observa¬ 
tions  to  incrementally  focus  on  the  correct  diagnosis. 

Model-based  troubleshooting  has  been  extensively  demonstrated  on  sim¬ 
ple  devices.  One  of  the  prime  motivations  of  this  work  is  to  scale  up  model- 
based  troubleshooting  techniques  to  deal  with  significantly  more  complex 
devices.  The  fundamental  problems  in  scaling  model- based  troubleshooting 
technology  to  do  this  can  be  understood  as  problems  within  each  element 
of  the  paradigm  (Figure  1.2).  These  five  problems  and  their  solutions  are 
discussed  individually  below. 

Models  are  incomplete.  No  model  can  possibly  capture  every  detail  of 
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Figure  1.2:  Model-Based  Troubleshooting  Problems 
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the  actual  device.  Lack  of  detail  in  the  device  representation  means  that 
some  failures  will  be  indistinguishable  and  others  will  be  misdiagnosed.  For 
example,  if  a  wire  connecting  several  terminals  is  represented  as  a  single 
component,  then  the  program  will  diagnose  a  break  anywhere  along  the  wire 
as  a  failure  of  the  whole  wire.  If  the  model  says  that  the  only  devices  affecting 
the  state  of  the  wire  are  the  ones  that  it  was  meant  to  connect,  then  the 
troubleshooting  program  will  misdiagnose  a  short  between  that  wire  and 
another  as  having  been  caused  by  one  or  more  other  failures.  The  selection  of 
the  primitive  elements  of  the  device  representation  constitutes  a  commitment 
to  a  set  of  failures  worth  identifying  and  worth  distinguishing  from  each  other. 

Models  are  incomplete,  but  the  consequences  of  that  incompleteness  can 
be  controlled  in  part  by  the  choice  of  primitive  elements  and  their  connections 
to  each  other.  Principles  are  needed  for  making  these  choices  in  a  way  that 
sacrifices  completeness  in  favor  of  efficiency,  since  the  aspiration  is  to  trou¬ 
bleshoot  circuits  with  many  thousands  of  wires,  transistors,  and  interactions 
between  them.  One  such  principle  is  that  physically  separate  components 
with  indistinguishable  failure  effects  can  be  treated  as  a  single  component. 
Another  principle  is  that  components  whose  failures  result  in  the  same  repair 
can  be  treated  as  a  single  component.  A  third  principle  is  that  unlikely  fail¬ 
ures  are  not  worth  representing  explicitly,  so  that  components  whose  failures 
are  individually  very  unlikely  can  all  be  treated  as  a  single  aggregate  com- 
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ponent  whose  failure  it  more  likely.  These  principles  introduce  additional 
approximations  into  a  device  model  that  will  make  some  component  failures 
indistinguishable  from  one  another.  A  deeper  problem  arises  from  the  fact 
that  any  model  explicitly  represents  only  some  of  the  possible  interactions 
between  components;  the  program  will  misdiagnose  any  failures  involving 
interactions  that  the  model  does  not  represent.  The  standard  example  is 
an  unintentional  short  between  two  wires  that  are  unrelated  in  the  circuit 
structure  diagram.  The  best  that  the  troubleshooting  program  will  do  is  to 
diagnose  this  as  two  failures,  one  in  each  wire.  The  approach  taken  in  this 
work  is  not  a  general  solution:  at  any  given  level  of  detail,  decisions  about 
which  interactions  between  components  ought  to  be  represented  are  made 
solely  on  the  basis  of  what  is  needed  to  explain  the  normal  operation  of  the 
device.  In  the  case  of  wires,  only  the  interactions  with  the  devices  they  are 
supposed  to  be  connected  to  are  represented,  hence  shorts  are  misdiagnosed. 
When  it  comes  time  to  repair  the  two  wires  one  may  assume  that  their  true 
(mutual)  problem  will  be  discovered  by  visual  inspection. 

Observations  are  costly.  Taking  measurements  is  nearly  always  appropri¬ 
ately  regarded  as  being  more  costly  than  computation  spent  on  choosing  that 
measurement.  The  problem  that  scaling  brings  is  that  the  more  complex  the 
device,  the  more  events  there  are  to  observe,  and  the  shorter  the  intervening 
intervals,  the  more  difficult  they  are  to  observe.  It  is,  for  example,  more 
costly  to  set  up  a  logic  analyser  to  capture  digital  signals  at  particular  mo¬ 
ments  than  it  is  to  observe  whether  they  are  staying  at  a  constant  sero  or 
one. 

Observations  are  costly,  and  although  there  is  nothing  that  can  be  done 
about  this  directly,  the  device  model  can  describe  signals  in  ways  that  are 
relatively  cheap  to  observe.  For  example,  it  is  easier  to  observe  whether  a 
particular  signal  is  rising  or  falling  than  to  observe  its  changing  value  at  every 
moment.  This  is  an  example  of  a  useful  temporal  abstraction;  a  long  sequence 
of  changes  of  value  can  be  summarized  into  a  simple  description  that  is  stable 
over  a  longer  time  interval.  A  behavior  model  can  use  this  kind  of  temporally 
abstract  observation  to  make  other  temporally  abstract  predictions,  without 
requiring  that  any  explicit  deductions  ever  be  made  about  the  individual 
changing  values.  As  a  general  principle  temporal  abstractions  are  useful 
because  they  provide  a  better  match  to  the  observations  that  can  be  made 
cheaply. 

Observations  are  incomplete  and  imprecise.  Discrepancies  can  only  be 
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detected  where  observations  can  be  made.  But  even  when  obaervationa  can 
be  made,  they  may  be  too  coarae  to  detect  diacrepanciea  with  the  model.  For 
example,  if  the  model  predicta  that  a  certain  current  ahould  be  flowing  in  a 
wire,  but  the  troubleahooter  can  only  measure  currents  to  within  20%,  then 
the  current  could  actually  be  wrong  and  yet  yield  no  apparent  discrepancy, 
hence  yield  no  new  information.  One  of  the  consequences  of  incomplete 
observations  is  that  there  will  inevitably  be  pairs  of  diagnoses  that  cannot 
be  discriminated,  since  their  only  difference  might  be  in  some  unobservable 
feature.  Inability  to  make  certain  observations  economically  imposes  limits 
on  the  ability  of  the  troubleshooting  program  to  isolate  faults. 

Because  observations  are  incomplete,  ambiguity  among  the  logically  pos¬ 
sible  diagnoses  is  inevitable.  If  the  troubleshooting  goal  is  to  find  the  most 
likely  diagnosis,  however,  other  sources  of  information  are  available.  One  of 
these  sources  is  information  about  the  relative  failure  rates  of  different  phys¬ 
ical  components,  from  which  the  troubleshooter  can  produce  a  rank  ordering 
of  the  diagnoses  by  plausibility.  A  related  source  is  information  about  how 
components  usually  fail  and  what  misbehaviors  they  produce;  this  can  be 
used  to  refine  the  likelihood  estimates  for  some  diagnoses.  These  sources 
of  knowledge  alleviate  the  indiscriminacy  caused  by  incomplete  observations 
because  they  can  be  used  to  discount  unlikely  diagnoses  and  leave  the  re¬ 
maining  (relatively  more  likely)  ones  behind. 

Prediction  is  costly.  It  is  impractical  within  a  troubleshooting  session  to 
simulate  an  entire  circuit  board  at  the  gate  level  for  more  than  a  few  clock 
cycles.  The  culprit  is  not  the  structural  complexity  of  the  board  in  number  of 
gates  or  wires.  The  culprit  is  the  complexity  of  the  behavior  —  the  number 
of  events  that  happen  and  need  to  be  simulated.  Waiting  for  more  computing 
power  to  apply  to  the  problem  is  not  a  solution  if  the  boards  to  be  diagnosed 
themselves  get  faster  and  more  complex. 

Prediction  is  costly,  but  this  can  be  addressed  by  using  temporally  ab¬ 
stract  behavior  descriptions.  Temporal  abstractions  can  summarize  many 
individual  events  into  an  aggregate  description  stable  over  a  longer  interval. 
For  example,  a  given  signal  may  be  described  as  a  sequence  of  many  thou¬ 
sands  of  individual  alternating  zeroes  and  ones,  or  more  abstractly  in  terms 
of  the  number  of  falling  edges  that  have  appeared,  or  even  more  abstractly 
as  the  number  of  one-to-zero  cycles  per  unit  time.  Although  the  value  of  the 
underlying  signal  may  be  changing  many  times  per  second,  the  average  num¬ 
ber  of  cycles  per  unit  time  may  be  relatively  stable.  Descriptions  that  are 


6 


CHAPTER 1.  INTRODUCTION 


stable  in  this  way  are  less  costly  to  make  predictions  from.  For  example,  the 
troubleshooting  scenario  to  be  presented  shortly  is  simple  because  the  behav¬ 
ioral  complexity  of  microprocessors  can  be  reduced  to  a  simple  relationship 
between  the  rates  of  change  at  their  inputs  and  outputs. 

Predictions  are  incomplete.  A  consequence  of  using  abstract  models  of 
behavior  to  achieve  more  economical  prediction  is  that  the  resulting  predic¬ 
tions  may  be  imprecise  or  ambiguous.  Predictions  that  are  too  coarse  make  it 
difficult  to  detect  discrepancies  with  observations,  and  this  in  turn  sacrifices 
some  of  the  ability  of  the  program  to  isolate  faults. 

Economical  predictions  are  incomplete,  but  the  indiscriminacy  that  re¬ 
sults  can  be  alleviated  by  using  multiple  levels  of  behavioral  abstraction.  If 
needed,  more  detailed  predictions  can  be  made  for  only  a  subset  of  the  entire 
device.  This  may  allow  more  discrepancies  to  be  detected  and  thereby  rule 
out  some  diagnoses. 


1.2  A  Troubleshooting  Scenario 

The  troubleshooting  program  described  in  this  report  uses  a  rich  and  mul¬ 
tilayered  circuit  model  that  is  designed  to  address  the  problems  identified 
above.  The  model  represents  the  physical  organization  in  terms  of  chips, 
wires,  and  so  forth,  and  represents  the  functional  organization  in  terms  of 
how  its  parts  interact  to  achieve  the  overall  intended  behavior.  Its  levels  of 
detail  range  from  a  qualitative  model  of  resistors  and  switches  up  to  arbitrar¬ 
ily  large  computational  modules.  It  represents  the  behaviors  of  components 
using  both  traditional  digital  abstractions  and  a  novel  set  of  temporal  ab¬ 
stractions  that  describe  signals  in  terms  such  as  cycles,  frequency,  and  change. 
Finally,  it  incorporates  knowledge  not  just  about  how  the  circuit  components 
should  work,  but  for  a  few,  how  they  break  and  how  often.  Only  one  cir¬ 
cuit  has  been  modeled  this  way,  but  it  is  large,  complex,  internally  diverse, 
and  real:  a  portion  of  the  Symbolics  3600  Console  Controller  Board  that 
contains  two  microprocessors  (both  running  programs  with  several  hundred 
instructions),  thirty  supporting  chips,  and  one  hundred  sixty  wires. 

Seven  troubleshooting  scenarios  using  this  circuit  will  be  presented  in  this 
document.  One  of  these  scenarios,  presented  here  in  abbreviated  form,  serves 
to  illustrate  the  distinctive  features  of  the  circuit  model  and  the  interaction 
of  the  troubleshooting  program  with  it. 
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The  Console  Controller  Board  is  responsible  for  transmitting  keystrokes 
and  mouse  motions  to  the  host  computer  and  for  decoding  the  video  signal 
coming  from  the  host  for  display  on  a  CRT  and  the  audio  signed  for  output  to 
a  speaker.  Some  keystroke  sequences  can  change  the  volume  of  the  speaker, 
the  brightness  of  the  CRT,  and  so  forth.  Figure  1.3  shows  abstractly  a  few  of 
the  components  (boxes)  and  the  signals  through  which  they  interact  (arrows). 

Figure  1.3:  A  Portion  of  the  Console  Controller  Board 


Each  small  superscript  represents  the  number  of  chips  in  that  component; 
there  are  16  in  all.  The  oscillator  0  produces  a  clock  signal  that  is  buffered 
by  B  and  sent  on  to  two  places:  the  reset  circuitry  R  and  to  a  microproces¬ 
sor  Ml.  The  microprocessor  Ml  polls  the  mouse  inputs.  Each  tenth  of  an 
inch  of  mouse  motion  along  its  *  or  y  axes  causes  Ml  to  interrupt  a  second 
microprocessor  M2  with  a  two-byte  message.  M2  responds  to  the  interrupt 
through  some  bus  control  circuitry  D.  After  receiving  the  two-byte  message 
M2  then  sends  the  message  on  to  the  host,  again  through  the  bus  control 
circuitry  D.  The  host  displays  the  changed  mouse  position  on  the  screen. 

Suppose  the  Console  Controller  Board  reset  button  is  pressed  and  the 
mouse  rolled  around  for  a  couple  of  seconds.  The  model  predicts  that  if  all 
16  chips  are  working,  then  mouse  motion  will  be  observed  at  Output.  The 
model  is  too  coarse  to  predict  how  fast  or  how  far  the  cursor  will  move  on 
the  screen  —  it  predicts  only  that  motion  will  be  observed.  This  temporally 
abstract  behavior  is  both  more  efficient  to  make  predictions  from  and  easier 
to  observe  than  the  traditional  clock- cycle-by-clock-cycle  model  of  digital 
circuit  behavior. 
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But  suppose  the  mouse  cursor  does  not  move  at  all.  The  program  indi¬ 
cates  that  any  one  of  the  16  chips  might  be  broken;  each  chip  is  a  suspect. 
There  are  now  many  possible  signals  to  probe,  and  the  program  ranks  them. 
The  likeliest  chip  to  fail  by  far  is  the  onboard  oscillator  O.  The  program 
suggests  probing  its  output;  suppose  it  is  observed  to  have  a  frequency  of 
approximately  10  Mhz. 

The  oscillator  0  can  be  discounted  as  an  unlikely  suspect  using  knowl¬ 
edge  in  the  model  about  how  some  components  fail.  The  model  says  that 
when  oscillators  fail,  they  usually  fail  catastrophically,  producing  an  output 
frequency  of  0.  Because  the  signal  was  observed  to  be  changing,  the  program 
concludes  that  the  oscillator  chip  is  probably  not  responsible.  It  is  still  a 
suspect,  just  a  relatively  unlikely  one.  This  leaves  15  chips  as  likely  suspects. 

The  program  now  needs  to  suggest  another  probe.  To  suggest  a  probe  it 
considers  the  predictions  that  the  model  makes  at  each  signal.  For  example, 
the  model  predicted  that  the  output  of  the  oscillator  O  should  have  frequency 
10  Mhz,  and  the  probe  verified  this.  The  model  also  predicts  that  the  Clock 
signal  should  have  frequency  5  Mhz.  The  representation  of  these  clock  signals 
in  terms  of  their  frequencies  is  an  example  of  a  temporal  abstraction;  millions 
of  underlying  events  (rising  and  falling  edges)  have  been  abstracted  into  a 
simple  description  that  is  easy  to  reason  about  and  easy  to  observe. 

Although  the  model  represents  many  signals  in  temporally  abstract  ways, 
there  are  other  signals  for  which  the  standard  digital  vocabulary  suffices. 
For  example,  the  Constant  output  of  C  is  a  constant  1  throughout  the  entire 
session,  and  the  model  predicts  that.  Also,  the  Reset  signal  should  be  asserted 
while  the  reset  button  is  pressed  and  unasserted  otherwise,  and  the  model 
predicts  that  as  well. 

These  predictions  —  that  the  clock  frequency  is  5  Mhz,  and  so  forth  — 
can  be  used  in  subsequent  predictions.  The  temporally  abstract  behavior 
model  for  the  first  microprocessor  Ml  says  that  if  the  Clock  input  is  5  Mhz, 
the  Constant  input  is  1,  and  the  Reset  signal  is  not  asserted,  then  the  mi¬ 
croprocessor  is  running.  While  Ml  is  running,  each  movement  of  the  mouse 
results  in  the  Interrupt  line  being  asserted.  If  all  that  is  known  is  that  the 
mouse  is  moving  around,  the  model  does  not  predict  exactly  when  it  will  be 
asserted;  rather  it  predicts  that  the  signal  will  be  changing  while  the  mouse 
is  moving  and  a  constant  1  value  otherwise. 

The  model  makes  many  other  predictions,  but  these  are  all  that  will  be 
needed  in  this  example.  The  important  one  at  the  moment  is  the  predic- 
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tion  that  Interrupt  signal  will  be  changing  while  the  moose  is  moving.  This 
prediction  depends  on  eight  chips  working  properly,  those  in  all  components 
except  M2  and  D. 

The  probe  that  the  program  now  suggests  is  the  Interrupt  output  of  Ml. 
Suppose  the  interrupt  line  is  probed,  revealing  that  it  is  a  constant  1  even 
while  the  mouse  is  rolled  around.  This  is  a  discrepancy,  since  it  was  supposed 
to  be  changing  so  long  as  those  eight  chips  were  working  properly.  One  of 
the  chips  was  the  oscillator,  which  has  been  shown  to  be  an  unlikely  suspect; 
this  leaves  seven  as  likely  suspects  (Figure  1.4). 


Figure  1.4:  Likely  Suspects  After  Probing  Interrupt 


The  model  predicted  that  the  Reset  signal  should  be  asserted  just  while 
the  reset  button  was  pressed,  so  long  as  the  five  chips  in  0,  B  and  R  were 
working.  Probing  the  Reset  signal  reveals  that  upon  pressing  the  button  it 
is  asserted,  then  unasserted.  This  means  that  the  chips  in  R  are  no  longer 
suspects,  since  their  failure  could  not  explain  the  observations  made.  Now 
there  are  5  likely  suspects  (Figure  1.5). 

The  model  predicted  that  the  Constant  signal  should  be  1  throughout  the 
session,  so  long  as  the  chips  in  C  were  working.  Probing  this  signal  reveals 
that  it  is  indeed  1,  so  the  chips  in  G  are  no  longer  suspects.  Now  there  are 
3  likely  suspects  (Figure  1.6). 

Finally,  a  probe  of  the  Clock  signal  reveals  that  it  has  frequency  around 
5  Mhz.  The  model  says  that  if  the  clock  input  to  Ml  has  a  high  enough 
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Figure  1.5:  Likely  Suspects  After  Probing  Reset 


Figure  1.6:  Likely  Suspects  After  Probing  Constant 


frequency  and  the  reset  input  is  not  asserted,  then  the  microprocessor  should 
be  running.  This  means  that  the  Interrupt  signal  should  be  changing,  which 
contradicts  previous  observations.  Hence  Ml  is  the  only  remaining  suspect 
and  the  program  terminates. 

The  interesting  thing  about  this  scenario  is  that  it  is  so  simple  compared 
to  the  underlying  complexity  of  the  real  circuit.  The  circuit  is  structurally 
complex;  there  are  thousands  of  transistors  in  the  chips,  hundreds  of  possible 
flaws  in  the  wires  alone.  It  is  beh&viorally  complex;  consider  all  the  micro- 
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processor  instruction  cycles  that  occurred  during  the  one  second  of  mouse 
motion.  People  can  troubleshoot  the  circuit  without  thinking  about  all  those 
details,  and  the  program  can  troubleshoot  it  without  explicitly  representing 
them. 

The  important  thing  about  the  model  is  not  that  it  uses  abstractions 
to  deal  with  complexity;  any  representation  does  that.  The  important  idea 
is  that  there  are  structural  and  behavioral  abstractions  appropriate  to  trou¬ 
bleshooting.  Temporal  abstractions,  in  particular,  allow  the  program  to  avoid 
simulating  long  sequences  of  events  and  instead  reason  in  terms  of  “moving” 
mice,  “running”  clocks,  “changing”  signals,  and  so  forth.  There  are  also 
principles  by  which  those  abstractions  can  be  manually  applied  to  a  com¬ 
plex  circuit  to  construct  the  rich  representation  that  makes  troubleshooting 
of  complex  devices  tractable.  The  model  of  the  Console  Controller  Board 
is  appropriate  for  model-based  troubleshooting  because  it  was  constructed 
according  to  those  principles. 


1.3  Contributions 

This  thesis  presents  a  methodology,  a  representation,  and  an  implemented 
program  for  troubleshooting  digital  circuit  boards  at  roughly  the  level  of 
expertise  of  a  human  novice. 

The  methodological  claim  is  that  existing  methods  for  model-based  trou¬ 
bleshooting  have  not  scaled  up  to  deal  with  complex  digital  circuits  because 
traditional  circuit  models  do  not  explicitly  represent  aspects  of  the  device 
that  troubleshooters  would  consider  important.  For  complex  devices  the 
model  of  the  target  device  should  be  constructed  with  the  goal  of  trou¬ 
bleshooting  explicitly  in  mind. 

Given  that  methodology,  there  are  principles  by  which  complex  circuits 
can  be  represented  so  as  to  make  those  important  aspects  explicit  and  thereby 
help  make  the  troubleshooting  task  tractable.  Some  of  the  salient  principles 
follow. 

One  set  of  principles  concerns  how  the  structure  of  a  given  circuit  should 
be  represented. 


•  Components  in  the  representation  of  the  physical  organization  of  the 
circuit  should  correspond  to  the  possible  repairs  of  the  actual  device. 
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The  representation  of  physical  organisation  plays  a  central  role  in  the 
troubleshooting  program,  and  the  program  represents  all  of  its  diagnoses  in 
terms  of  the  physical  components  that  conld  be  damaged.  In  the  scenario 
presented  earlier,  for  example,  the  diagnoses  were  expressed  in  terms  of  chips, 
which  are  “repaired”  by  replacement.  Making  the  elements  of  this  represen¬ 
tation  correspond  to  possible  repair  actions  ensures  that  the  troubleshooting 
program  will  not  waste  effort  trying  to  discriminate  between  diagnoses  that 
have  identical  repairs. 

•  Components  in  the  representation  of  the  functional  organization  of  the 
circuit  should  facilitate  behavioral  abstraction. 

The  only  role  that  an  explicit  representation  of  functional  organization 
plays  in  model-based  troubleshooting  is  to  make  behavior  prediction  more 
efficient.  For  example,  the  only  reason  that  the  component  M2  exists  in  the 
model  is  because  the  combined  behavior  of  the  four  chips  inside  it  can  be 
described  more  simply  in  the  aggregate  than  individually.  In  extracting  the 
functional  organisation  from  a  raw  schematic  the  modeler  need  only  represent 
what  will  make  the  behavior  easiest  to  reason  with,  rather  than  necessarily 
what  the  designer  had  in  mind. 

A  second  set  of  principles  concerns  the  representation  of  circuit  behavior. 

•  The  behavior  of  components  should  be  represented  in  terms  of  features 
that  are  easy  for  the  troubleshooter  to  observe. 

Some  features  of  time-varying  signals  are  easier  to  observe  than  others. 
The  frequency  of  a  clock,  for  example,  is  easier  to  observe  than  the  timing 
of  each  of  its  individual  transitions.  Expressing  the  behavior  of  components 
in  the  terms  that  are  more  easily  observed  is  a  way  of  choosing  where  to 
sacrifice  precision  in  favor  of  efficiency. 

•  The  behavior  of  a  component  for  which  changes  on  its  inputs  always 
results  in  changes  on  its  outputs  should  be  represented  in  temporally 
coarse  terms. 

A  powerful  representation  technique  uses  relationships  between  compo¬ 
nent  inputs  and  outputs  in  terms  that  are  stable  over  long  periods  of  time  or 
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that  summarise  much  actintj  iato  a  nif!'  la  the  troubleshoot¬ 
ing  scenario,  the  number  of  mouse  step  *  *♦»  a  period  of  seconds  (a 

single  parameter  describing  mach  actjer*  *  the  number  of  times 

the  interrupt  line  would  be  asserted  c  - —  *  Suck  relationships  can 

be  derived  when  each  individual  change  .  :  ***•  more  other  changes. 

a  A  temporally  coarse  behavior  desmptaoa  that  only  coven  part  of  the 
behavior  of  a  component  is  better  than  not  covering  any  at  all. 

Although  the  full  behavior  of  a  component  may  be  too  complex  to  reduce 
to  a  simple  relationship  between  (say)  the  number  of  changes  on  its  inputs 
and  the  number  of  changes  on  its  outputs,  there  may  be  such  a  relationship 
that  involves  only  a  subset  of  its  inputs,  assuming  that  the  othen  are  held 
constant.  In  the  case  of  the  microprocessor,  for  example,  the  relationship 
between  the  mouse  motion  inputs  and  interrupt  output  holds  only  so  long 
as  the  clock  input  is  running  and  the  reset  input  is  not  asserted.  Since  the 
troubleshooting  program  will  eventually  use  the  more  detailed  behaviors  as 
long  as  the  diagnosis  remains  ambiguous,  no  diagnostic  resolution  will  be  lost 
by  only  representing  a  subset  of  the  possible  behaviors  abstractly. 

a  A  sequential  circuit  should  be  encapsulated  into  a  single  component  to 
enable  the  description  of  its  behavior  in  a  temporally  coarse  way. 

Although  the  individual  behaviors  of  the  components  in  a  sequential  cir¬ 
cuit  may  not  lend  themselves  to  temporally  coarse  descriptions,  the  loop  may 
be  performing  a  simple  function  when  taken  as  a  whole.  For  example,  the 
R  component  in  the  troubleshooting  scenario  is  actually  a  sequential  circuit 
with  214  distinct  states.  When  viewed  in  temporally  coarse  terms,  however, 
there  is  a  simple  correspondence  between  the  states  of  the  button  and  the 
state  of  the  output.  Encapsulating  the  group  of  components  makes  it  pos¬ 
sible  to  reason  about  its  behavior  in  a  temporally  coarse  way,  and  as  in  the 
troubleshooting  scenario  described,  it  may  not  be  necessary  to  ever  consider 
the  details  of  its  behavior. 

A  final  set  of  principles  concerns  what  knowledge  about  failures  should 
be  represented  explicitly. 

•  An  explicit  representation  of  a  given  component  failure  mode  should 
be  used  if  the  underlying  failure  has  high  likelihood. 
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Components  break  in  the  field  in  certain  ways  much  more  often  than 
other  ways.  Chips,  for  example,  fail  more  often  with  breaks  in  the  tiny  wires 
that  connect  their  pins  to  the  silicon  chip  inside  than  in  other  ways.  The 
benefit  of  knowledge  about  such  failures  comes  when  they  are  inconsistent 
with  the  symptoms,  since  this  can  reduce  the  ambiguity  among  the  possible 
diagnoses. 

#  An  explicit  representation  of  a  given  component  failure  mode  should  be 
used  if  the  resulting  misbehavior  is  drastically  simpler  than  the  normal 
behavior  of  the  component. 

If  a  component  with  normally  complex  behavior  has  some  internal  fault  or 
faults  that  cause  it  to  misbehave  catastrophically,  then  any  partially  correct 
behavior  observed  for  the  component  makes  it  a  less  likely  suspect.  In  the 
troubleshooting  example,  the  oscillator  was  known  to  fail  in  a  way  that  made 
it  produce  a  zero  output  frequency,  and  that  misbehavior  was  easy  to  rule 
out  even  though  the  measurement  of  its  output  was  imprecise.  The  benefit  of 
knowledge  about  these  failure  modes  is  especially  great  when  the  misbehavior 
has  high  likelihood  as  well. 

The  implemented  model  of  the  Console  Controller  Board  is  a  concrete 
embodiment  of  the  methodology  and  representation  principles.  The  trou¬ 
bleshooting  program  that  uses  that  model  is  an  extension  of  standard  model- 
based  troubleshooting  technology,  incorporating  solutions  to  technical  prob¬ 
lems  of  (i)  hierarchic  diagnosis  with  multiple  and  tangled  hierarchies  (ii)  inte¬ 
gration  of  explicit  knowledge  about  failure  modes  into  a  framework  for  diag¬ 
nosing  multiple  faults,  and  (iii)  troubleshooting  circuits  with  time-dependent 
behavior. 


1.4  Organization 

This  document  is  primarily  organized  by  the  different  kinds  of  circuit  knowl¬ 
edge  to  be  represented.  Preliminary  background  material  is  contained  in 
Chapter  2,  which  presents  an  overview  of  knowledge-based  automated  di¬ 
agnosis,  especially  model-based  troubleshooting.  Chapter  3  presents  the 
troubleshooting  scenarios  for  the  Console  Controller  Board  so  as  to  pro¬ 
vide  context  for  the  many  details  to  follow.  The  next  four  chapters  contain 
the  essential  ideas.  Chapter  4  presents  a  representation  for  circuit  structure 
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motivated  by  troubleshooting  requirements.  Chapter  5  contains  the  bulk 
of  the  document  and  describes  a  representation  for  circuit  behavior  using 
multiple  temporal  abstractions  and  a  temporal  reasoning  program  for  pre¬ 
dicting  behavior  using  those  same  abstractions.  Chapter  6  describes  how 
faults  and  misbehaviors  are  modeled  and  how  this  knowledge  is  used  by  the 
troubleshooting  program  to  heuristically  discount  unlikely  diagnoses.  Chap¬ 
ter  7  presents  the  details  of  the  troubleshooting  engine  and  how  it  interacts 
with  the  choices  made  in  representing  circuit  structure  and  behavior.  Fi¬ 
nally,  Chapter  8  summarizes  and  presents  ideas  for  future  work.  Sections  on 
related  research  are  distributed  throughout  the  individual  chapters. 


Chapter  2 
Background 


A  number  of  knowledge-based  programs  for  automated  diagnosis  have  been 
built  for  a  variety  of  domains  using  a  variety  of  implementation  technologies. 
These  programs  can  be  characterized  by  the  knowledge  that  they  represent 
explicitly:  (i)  associations  between  underlying  diseases  or  faults  and  their 
consequences  for  the  system  as  a  whole,  as  opposed  to  (ii)  knowledge  about 
the  parts  of  the  system  and  how  they  interact  to  produce  its  overall  behavior. 
In  medical  diagnosis,  for  example,  the  contrast  is  between  knowledge  about 
diseases  and  their  symptoms  versus  knowledge  about  the  underlying  mecha¬ 
nism;  it  is  the  difference  between  knowledge  that  emphysema  causes  shortness 
of  breath  versus  knowledge  that  CO3  exchange  is  proportional  to  the  surface 
area  of  the  alveoli.  Programs  that  rely  on  the  former  type  of  knowledge  will 
be  termed  symptom-baaed  and  the  latter  model-based.  A  number  of  programs 
incorporate  both  kinds  of  knowledge,  but  for  any  given  program  it  is  typically 
clear  which  one  predominates.  A  brief  review  of  each  paradigm  is  presented 
below.  One  particular  program  for  model- based  diagnosis  will  be  presented 
in  some  detail,  since  it  provides  the  basis  for  the  troubleshooting  technology 
in  this  report. 


* 


« 


2.1  The  Symptom-Based  Approach 

One  approach  to  automated  diagnosis  is  to  organize  the  program  as  a 
database  that  associates  underlying  diseases  (faults)  with  their  outward 
symptoms  (manifestations).  To  find  the  underlying  problem  from  a  set  of 
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symptoms  requires  straightforward  lookup  or  pattern  matching.  The  notion 
of  a  “fault  dictionary”  is  the  canonical  example  of  this  approach.  The  princi¬ 
pal  difficulty  in  this  approach  revolves  around  the  coverage  of  diseases  in  the 
knowledge  base.  First,  associations  between  single  diseases  and  their  symp¬ 
toms  does  not  easily  support  reasoning  about  interactions  between  diseases. 
Second,  even  if  multiple  simultaneous  diseases  can  be  handled  the  program 
is  limited  to  considering  those  individual  diseases  that  were  anticipated  and 
explicitly  included  by  the  knowledge  base  builder  —  there  is  no  theory  about 
how  to  enumerate  the  possible  diseases  of  a  given  system.  Third,  given  a 
knowledge  base  intended  to  be  used  for  diagnosing  a  particular  system  there 
is  no  principled  way  to  modify  the  knowledge  base  when  there  has  been  a 
change  in  the  design  (or  in  our  understanding)  of  that  system.  Although  the 
paradigm  has  these  inherent  limitations  and  is  not  used  here,  some  important 
techniques  that  generalize  beyond  it  were  first  developed  within  this  tradi¬ 
tion:  techniques  for  dealing  with  uncertainty,  for  organizing  large  knowledge 
bases,  and  for  dealing  with  multiple  diseases.  These  techniques  are  each 
treated  briefly  below. 

2.1.1  Dealing  with  Uncertainty 

The  notion  of  a  disease-symptom  database  requires  some  elaboration  in  do¬ 
mains  for  which  the  underlying  diseases  have  widely  varying  likelihoods  and 
for  which  the  associations  between  diseases  and  symptoms  is  less  than  cer¬ 
tain.  One  approach  is  to  assign  prior  probabilities  to  the  diseases,  assign 
conditional  probabilities  to  the  symptoms  given  each  disease,  and  use  Bayes’ 
Theorem  to  find  the  likeliest  disease  given  a  set  of  symptoms  [Szolovits78]. 
Many  automated  diagnosis  systems  use  statistical  information  in  this  form  in 
spite  of  the  large  number  of  conditional  probabilities  needed  when  diseases 
or  symptoms  are  not  independent.  One  reason  for  the  enduring  popularity  of 
the  probabilistic  framework  is  that  it  allows  the  use  of  decision  theoretic  tech¬ 
niques  to  choose  observations  that  are  most  likely  to  reduce  the  ambiguity 
among  competing  diagnoses.  Estimating  ambiguity  using  Shannon  entropy 
and  choosing  the  next  observation  based  on  a  one-ply  lookahead  turns  out  to 
provide  good  results  on  average  [Gorry73].  A  non-Bayesian  approach  to  deal¬ 
ing  with  uncertain  knowledge  is  taken  by  the  MYCIN  program  [Shortliffe76], 
which  computes  “certainty  factors”  for  its  conclusions,  but  it  suffers  from  the 
same  difficulties  with  interacting  diseases  as  Bayesian  approaches. 
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2.1.2  Organising  Knowledge 

Obtaining  diagnostic  coverage  of  any  interesting  domain  requires  the  mainte¬ 
nance  of  a  large  knowledge  base.  This  in  turn  implies  the  need  for  principles 
for  organising  this  knowledge.  Organising  knowledge  about  diseases,  symp¬ 
toms,  and  diagnostic  procedures  into  frames  [Minsky75]  appears  in  the  diag¬ 
nosis  program  PIP  [Pauker76].  The  use  of  frames  implies  no  commitment  as 
to  whether  knowledge  about  diseases,  symptoms,  or  causal  mechanisms  will 
be  stored;  rather  it  allows  modularisation  of  the  knowledge  base  and  thereby 
simplifies  its  maintenance.  The  organisation  of  diseases  and  their  symptoms 
into  specialisation  hierarchies,  as  in  the  internal  medicine  diagnosis  program 
INTERNIST  [Pople82],  is  an  elaboration  of  this  idea.  A  hierarchic  organisa¬ 
tion  makes  only  a  minimal  commitment  to  the  character  of  the  knowledge, 
but  it  does  allow  the  program  to  deal  with  groups  of  related  diseases  more 
efficiently.  A  stronger  organising  principle  appears  in  the  glaucoma  diagnosis 
program  CASNET  [Kulikowski82],  in  which  knowledge  is  organised  around 
disease  states  and  their  temporal  progression.  This  network  of  states  and 
their  successor  relationships  was  intended  to  represent  a  causal  explanation 
of  the  disease.  Although  the  use  of  this  knowledge  in  CASNET  is  probabilistic 
and  not  substantially  different  from  other  symptom-based  programs,  it  was 
recognized  that  causality  could  be  a  powerful  organizing  principle  because 
the  knowledge  acquired  from  domain  experts  is  often  couched  as  categorical 
explanation  that  can  be  translated  into  causal  terms. 

2.1.3  Diagnosing  Multiple  Diseases 

Among  the  most  difficult  cases  in  medicine  and  other  diagnostic  tasks  are 
those  in  which  more  than  one  underlying  disease  or  fault  is  present.  One  ap¬ 
proach  is  to  assume  that  all  underlying  diseases  are  statistically  and  causally 
independent.  The  program  can  then  simply  evaluate  the  likelihood  of  ev¬ 
ery  disease  individually.  This  approach  is  taken  in  MYCIN  [Shortliffe76]  but 
it  requires  such  strong  independence  assumptions  that  it  is  only  feasible  in 
restricted  domains.  Another  approach  is  taken  by  INTERNIST  [Pople82], 
in  which  diagnoses  are  incrementally  constructed  by  repeatedly  choosing  a 
disease  that  explains  the  most  unexplained  symptoms,  until  there  are  no 
unexplained  symptoms  left.  While  intuitively  appealing,  this  does  not  guar¬ 
antee  coverage  of  the  possible  disease  combinations.  The  approach  used  in 
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[Reggia83]  addresses  this  coverage  problem  by  considering  every  set  of  dis¬ 
eases  whose  combined  symptoms  cover  all  and  only  the  observed  symptoms. 
By  Occam’s  razor,  the  hypotheses  that  should  be  considered  are  the  minimal 
covering  sets  —  those  that  do  not  include  diseases  not  needed  to  explain  the 
symptoms.  Using  probabilistic  knowledge  the  likeliest  of  the  minimal  combi¬ 
nations  is  then  chosen  as  the  preferred  diagnosis.  Each  of  these  approaches, 
however,  perform  poorly  when  the  symptoms  of  the  various  diseases  interact. 

2.1.4  Summary  of  Symptom-Based  Approaches 

Work  on  symptom-based  programs  for  automated  diagnosis  has  yielded  a 
number  of  powerful  and  useful  techniques.  These  include  (i)  observation  and 
test  selection  based  on  decision  theory,  with  entropy  as  the  heuristic  evalua¬ 
tion  function  (ii)  the  use  of  causality  as  an  organizing  principle  for  diagnostic 
knowledge,  and  (iii)  the  formalization  of  diagnosis  in  terms  of  covering  sets, 
allowing  for  diagnosis  of  multiple  simultaneous  diseases.  The  principal  diffi¬ 
culty  with  symptom-based  approaches  is  that  the  correctness  and  coverage  of 
the  knowledge  base  is  difficult  to  guarantee,  especially  in  the  face  of  changes 
to  the  underlying  system.  When  the  available  domain  theory  is  weak,  with 
only  empirical  associations  between  underlying  diseases  and  observable  symp¬ 
toms,  the  symptom-based  approach  is  reasonable  and  can  be  successful.  Its 
limitations,  however,  motivate  the  model-based  approach  discussed  below, 
which  can  provide  better  coverage  and  extensibility  in  domains  where  those 
properties  are  important. 


2.2  The  Model-Based  Approach 

Model-based  troubleshooting  is  a  widely  investigated  and  well  established 
methodology.  The  majority  of  the  programs  that  share  this  paradigm  are 
for  diagnosis  of  designed  artifacts  such  as  circuits,  so  the  term  “device”  will 
be  used  interchangeably  with  “system,”  and  the  notion  of  a  “disease”  will 
be  replaced  by  that  of  a  “fault.”  The  key  to  the  model-based  approach  is 
the  representation  of  the  structure  and  behavior  of  the  correctly  functioning 
device.  This  representation  is  used  to  make  predictions  about  the  behavior 
of  the  real  device  and  about  the  outcomes  of  possible  observations.  Dis¬ 
crepancies  between  the  predicted  behavior  and  the  actual  observations  are 
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traced  to  fete  of  possibly  malfunctioning  components.  Each  set  of  compo¬ 
nents  whose  failure  could  explain  the  observations  will  be  called  a  candidate; 
these  candidates  can  be  ranked  according  to  their  relative  likelihood.  As 
new  discriminating  observations  are  added  some  candidate  will  eventually 
dominate  the  others  and  be  chosen  as  the  final  diagnosis,  a  set  of  compo¬ 
nents  believed  to  be  failing.  With  a  hierarchic  representation  of  structure, 
the  isolation  process  can  be  repeated  recursively  on  the  substructure  of  each 
component  believed  to  be  faulty. 

The  key  advantage  of  using  knowledge  about  the  correct  behavior  of  com¬ 
ponents  is  that  it  dispenses  with  the  need  for  storing  associations  between 
underlying  faults  and  observed  misbehaviors  of  the  entire  device.  Instead, 
any  subset  of  components  whose  predicted  combined  behavior  disagrees  with 
the  behavior  actually  observed  contains  at  least  one  broken  component.  By 
gathering  more  observations  the  troubleshooter  can  narrow  down  this  set. 
Furthermore,  this  requires  no  commitment  to  the  number  of  faults  actually 
in  the  device,  since  the  model  can  support  reasoning  about  the  interactions 
between  any  number  of  failing  components.  Finally,  as  noted  earlier,  when 
causal  knowledge  is  available  it  can  be  easier  to  obtain  than  knowledge  about 
overall  associations  between  symptoms  of  failures  and  possible  underlying 
faults. 

It  is  useful  to  consider  model-based  troubleshooting  in  terms  of  four  ba¬ 
sic  activities:  modeling ,  behavior  prediction,  candidate  generation,  and  dis¬ 
crimination.  The  following  sections  discuss  each  of  these  activities;  a  more 
complete  survey  appears  in  [Hamscher87]. 

2.2.1  Modeling 

In  model-based  troubleshooting  the  notion  of  a  “device  model”  is  almost 
universally  understood  to  mean  a  lumped  element  description,  that  is,  the 
structure  of  the  device  is  represented  as  a  network  of  typed  components  and 
connections  between  them.  Examples  of  models  in  various  domains  include: 

•  Circuit  schematics  with  resistors,  diodes,  and  so  forth.  This  rep¬ 
resentation  of  analog  circuits  is  used  in  INTER  [deKleer76],  WAT¬ 
SON  [Brown76],  SOPHIE  [Brown82],  IDS  [Pan84],  IN-ATE  [Cantone83], 
DEDALE  [Dague87],  and  others  [Milne85]. 
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•  Circuit  schematics  using  logic  gates  and  higher-level  digital  com¬ 
ponents  such  as  multiplexors  and  adders.  This  representation  is 
used  in  HT  [Davis84],  DART  [Genesereth84],  and  others  [Friedman83] 
[AbuHanna88]. 

•  Piping  and  instrumentation  diagrams,  which  include  components  such 
as  valves,  potentiometers,  lamps,  and  so  forth,  used  in  LES/LOX 
[Scarl85]. 

•  Models  of  human  physiology.  Fluid  models  in  terms  of  compartments, 
their  permeable  membranes,  and  so  forth  were  used  by  ABEL  [Patil81], 
by  the  system  proposed  in  [Kuipers84],  and  by  the  Heart  Failure  Pro¬ 
gram  [Long86].  A  model  of  the  human  nervous  system  in  terms  of 
unidirectional  neural  pathways  was  used  in  LOCALIZE  [First82]. 

The  behavior  of  the  entire  device  is  taken  to  arise  from  the  interaction  of 
the  behaviors  of  the  individual  components  through  the  connections.  Devel¬ 
oping  a  particular  description  involves  choosing  a  vocabulary  of  components 
and  their  behaviors,  then  representing  the  device  as  a  connected  network  of 
these  components.  Therein  lies  a  key  advantage  of  model-based  troubleshoot¬ 
ing  over  traditional  approaches:  for  designed  artifacts,  it  can  work  directly 
from  device  models  already  developed  for  design  and  analysis.  Model-based 
circuit  troubleshooting,  for  example,  can  in  principle  work  from  ordinary 
circuit  schematics  and  board  layout  information  needed  for  design  and  man¬ 
ufacture.  Therein  also  lie  some  of  the  deepest  problems  in  the  methodology: 
identifying  the  principles  for  building  device  models  that  are  appropriate  for 
model-based  troubleshooting  when  the  inherited  models  are  inappropriate. 
Indeed,  one  of  the  reasons  that  there  are  relatively  few  projects  using  the 
model-based  approach  in  medical  domains  is  the  scarcity  of  good  analytic 
models  for  any  substantial  system. 

The  modeler  confronts  three  goals  simultaneously:  achieving  fidelity ,  pre¬ 
cision,  and  efficiency.  A  model  has  fidelity  when  it  does  not  support  incorrect 
predictions  about  the  device.  A  model  has  precision  to  the  extent  that  the 
predictions  it  makes  are  strong  enough  to  be  falsifiable  by  observations  of  the 
actual  device.  A  model  is  efficient  when  the  work  needed  to  make  predictions 
using  it  is  proportional  to  the  benefits  to  be  gained. 

Fidelity  is  the  primary  modeling  goal  in  troubleshooting.  This  is  because 
if  the  model  makes  incorrect  predictions,  then  discrepancies  between  the  ac- 
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tual  device  and  the  model  will  be  wrongly  blamed  on  failures  in  the  device. 
One  way  of  ensuring  fidelity  is  to  (i)  ensure  that  the  primitive  elements  of 
the  model  support  correct  predictions  about  the  corresponding  primitive  ele¬ 
ments  of  the  device  when  they  are  in  isolation,  and  (ii)  ensure  that  the  ways 
in  which  the  primitives  can  be  composed  preserves  fidelity,  just  as  the  compo¬ 
sition  of  the  real  elements  of  the  device  does  not  change  those  elements.  This 
is  the  basic  idea  behind  the  principle  of  no  function  in  structure  [deKleer84]. 
No  function  in  structure  means  that  the  description  of  a  component  behavior 
may  not  rely  on  the  correct  functioning  of  the  whole  device. 

Ensuring  that  the  models  of  primitive  components  are  correct  in  isolation 
involves  making  sure  that  all  the  ways  they  could  interact  with  other  compo¬ 
nents  are  represented  explicitly.  For  example,  to  say  that  “when  the  switch 
is  dosed  current  will  flow”  is  incorrect  because  it  neglects  the  fact  that  a 
voltage  drop  is  required  for  current  to  flow.  It  also  neglects  to  mention  that 
if  there  is  a  temperature  differential  between  its  terminals  there  will  be  a 
conductive  heat  flow.  Nor  does  it  mention  that  if  the  switch  were  shorted  to 
some  wire  elsewhere  in  the  dreuit  then  current  could  flow  through  that  short. 
Any  such  interaction  not  represented  in  the  model  is  a  potential  source  of 
misdiagnoses,  and  the  more  interactions  left  out,  the  worse  the  problem. 

Precision  is  another  modeling  goal.  A  trivial  device  model  would  make 
no  predictions  at  all;  it  has  fidelity,  since  it  makes  no  false  predictions,  but 
it  is  useless  for  troubleshooting  because  it  cannot  produce  any  discrepancies 
dther.  A  useful  model  produces  predictions  that  can  be  confirmed  or  denied 
with  the  available  observations. 

Finally,  the  more  precise  the  model  and  the  greater  its  fidelity,  the  less 
efficient  it  is  to  use.  Consider  simulating  any  substantial  digital  circuit  with 
component  models  that  included  not  only  voltages  and  currents  in  the  wires 
and  transistors,  but  the  temperature  and  specific  heat  of  each  contiguous 
piece  of  metal  and  semiconductor,  the  electromagnetic  interactions  with  ev¬ 
ery  other  component,  and  so  forth.  A  model  with  so  much  detail  is  obviously 
impractical  and  highlights  the  key  dilemma  for  the  modeler:  how  to  sacrifice 
fidelity  and  precision  in  ways  that  gain  efficiency.  Of  these,  sacrificing  fidelity 
is  more  serious,  since  it  results  in  incorrect  diagnoses,  while  sacrificing  pre¬ 
cision  only  results  in  ambiguity  among  different  diagnoses.  Interactions  can 
be  ignored  for  which  only  unlikely  failures  would  make  the  interactions  have 
noticeable  effects.  In  the  switch  example  earlier,  being  shorted  to  another 
wire  in  the  circuit  is  possible  and  could  have  noticeable  effects  on  the  switch, 
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but  if  shorts  are  unlikely  failures  in  general  it  is  reasonable  to  ignore  that 
possible  interaction  in  the  switch  model. 

2.2.2  Behavior  Prediction 

The  prediction  task  encompasses  any  categorical  reasoning  about  the  state  of 
the  device  based  on  observations  of  its  behavior.  Given  a  device  model  built 
up  as  a  network  of  components  each  with  its  own  local  behavior  description, 
to  a  first  approximation  behavior  prediction  can  be  done  by  propagating 
many  individual  predictions  local  to  each  component. 

For  example,  suppose  both  inputs  to  an  adder  component  Adder-1  are 
believed  to  be  2  (Figure  2.1).  The  output  of  the  adder  can  then  be  computed 
using  only  local  knowledge  about  its  intended  behavior.  Similarly,  the  output 
of  Adder-2  can  be  predicted  using  its  two  inputs. 


Figure  2.1:  Behavior  Prediction  Example 


Behavior  prediction  in  that  case  is  simply  a  kind  of  simulation:  conclu¬ 
sions  about  the  adder  outputs  were  based  on  their  inputs.  However,  the 
behavior  model  need  not  only  predict  outputs  from  inputs,  but  can  enforce 
any  logical  relationship  between  the  values  carried  by  connections  in  the  de¬ 
vice.  For  example,  if  one  input  to  Adder-2  is  4,  and  the  output  is  6,  then 
the  other  input  is  predicted  to  be  2  (Figure  2.2).  Similarly,  if  one  input  to 
Adder-1  is  2,  then  the  other  input  is  deduced  to  be  0. 

The  technique  of  predicting  behavior  by  accumulating  local  predictions 
can  be  extended  to  reasoning  about  time-dependent  behavior.  For  example, 
when  all  the  inputs  and  initial  state  of  a  flip-flop  are  known  over  a  cer¬ 
tain  interval  of  time  then  the  outputs  can  be  predicted  over  that  interval 
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Figure  2.2:  Reasoning  from  Effects  to  Causes 


as  well.  The  obstacle  that  behaviorally  complex  devices  present  is  that  in 
general  this  means  explicitly  computing  and  representing  every  event.  In 
the  flip-flop  example,  the  events  in  question  are  changes  of  boolean  value. 
Reasoning  about  the  behavior  of  a  digital  circuit  over  any  appreciable  length 
of  time  is  impractical;  the  culprit  is  the  sheer  number  of  clock  transitions 
and  consequent  changes  of  state  that  might  be  involved.  Devices  with  com¬ 
plex  time-dependent  behavior  motivate  the  use  of  abstractions  that  allow 
predictions  to  be  made  without  having  to  explicitly  construct  such  extensive 
sequences. 

For  efficiency,  the  nominal  behavior  of  the  device  given  some  standard 
stimuli  may  be  stored  as  part  of  the  model.  In  the  model  of  human  acid-base 
and  electrolyte  equilibrium  in  ABEL  [Patil81],  for  example,  each  parameter 
of  the  model  has  an  expected  value  assuming  normal  patient  activity  (for 
example,  normal  fluid  intake).  Similarly,  the  troubleshooting  systems  of 
[Cantone83]  and  [Milne85]  store  nominal  values  at  circuit  nodes  for  each 
of  a  fixed  set  of  tests.  This  is  at  least  a  partial  solution  to  the  problem 
of  expensive  predictions.  This  thesis  takes  a  different  approach,  focusing 
instead  on  having  abstractions  that  will  support  economical  prediction. 

2.2.3  Candidate  Generation 

When  discrepancies  are  found  between  the  observed  behavior  and  the  be¬ 
havior  predicted  by  the  device  model,  candidate  generation  produces  one  or 
more  explanations  for  those  discrepancies.  There  are  at  least  three  ways  of 
approaching  this  task. 

The  first  technique  is  to  associate  with  each  prediction  made  in  the  model 
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the  sets  of  components  whose  correct  behavior  would  support  that  prediction. 
For  example,  the  prediction  in  Figure  2.1  that  the  output  of  the  second 
adder  is  8  would  be  supported  by  the  set  of  components  {Adder-1,  Adder- 
2}.  With  this  supporting  information,  each  discrepancy  can  be  explained  by 
the  failure  of  one  or  more  of  the  components  in  those  sets.  For  example,  if 
the  output  was  observed  to  be  6  instead  of  8,  then  at  least  one  of  those  two 
components  is  broken.  If  there  are  several  discrepancies,  then  the  broken 
components  must  form  a  covering  set  (as  in  [Reggia83],  where  the  symptoms 
of  the  diseases  present  must  form  a  covering  set  of  all  observed  symptoms).  If 
a  single  failure  is  assumed,  then  the  candidates  form  the  intersection.  There 
are  differences  in  the  machinery  —  especially  in  the  way  that  dependencies 
between  predictions  and  components  that  support  them  are  recorded  —  but 
this  idea  is  at  the  core  of  the  candidate  generation  procedures  in  [deKleer76], 
[Brown82],  [Davis84],  [Genesereth84],  [Scarl85],  [deKleer87],  and  [Dague87]. 
The  details  of  GDE  [deKleer87]  will  be  presented  shortly,  since  it  provides 
the  basis  for  the  program  in  this  report.  [Ginsberg86]  and  [Reiter87]  provide 
formal  interpretations  for  this  technique  based  on  the  notion  that  broken 
components  are  abnormal  and  the  preferred  diagnoses  are  those  requiring 
the  minimal  abnormalities.  An  important  advantage  of  this  technique  is 
that  it  requires  no  information  about  how  certain  components  might  fail; 
only  the  correct  behavior  needs  to  be  known. 

A  second  technique  extends  the  first  by  taking  advantage  of  fault  models 

—  knowledge  about  how  individual  components  fail.  After  finding  com¬ 
ponents  whose  failure  could  explain  all  discrepancies,  the  effects  of  known 
failure  types  in  those  components  are  simulated.  If  the  set  of  known  failures 
is  treated  as  exhaustive,  then  candidates  can  be  exonerated  bv  fault  simu¬ 
lation.  For  example,  suppose  some  wire  is  a  candidate.  Wires  fail  only  by 
breaking,  so  the  program  could  simulate  the  effects  of  that  wire  becoming 
an  open  circuit  and  check  whether  that  is  consistent  with  the  observations. 
If  it  is  not  consistent,  the  wire  would  be  exonerated.  This  technique  is  used 
in  SOPHIE  [Brown82]  and  several  other  model-based  troubleshooting  pro¬ 
grams.  IDS  [Pan84]  goes  further  and  explicitly  models  component  failures 
in  a  way  that  allows  dependent  failures  —  failures  caused  by  prior  failures 

—  to  be  explicitly  represented  and  diagnosed.  The  additional  power  that 
fault  models  provide,  however,  comes  at  a  high  price,  since  it  is  difficult  to 
provide  an  exhaustive  list  of  failures  for  anything  other  than  the  simplest  of 
components. 
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The  third  technique  generate*  alternative  explanation*  for  each  discrep¬ 
ancy  incrementally,  as  in  ABEL  [PatilSl].  For  example,  if  in  Figure  2.1  the 
output  had  been  observed  to  be  6  instead  of  8  as  expected,  among  the  ini¬ 
tial  possibilities  are  that  Adder-2  is  broken,  that  one  or  both  of  its  inputs 
are  lower  than  expected,  and  that  one  of  the  inputs  is  higher  than  expected 
and  the  other  lower.  Some  of  these  are  inconsistent  with  the  observations 
(for  example,  one  of  its  inputs  is  known  to  be  4)  and  are  discarded;  the 
others  survive  to  be  further  elaborated.  The  knowledge  about  the  system  is 
the  same  as  that  available  to  the  previous  technique;  the  difference  is  that 
generating  candidates  and  using  fault  models  to  check  their  consistency  is 
interleaved.  The  advantage  of  doing  so  becomes  evident  when  diagnosing  a 
system  with  feedback  or  with  high  connectivity  between  its  components.  If 
only  knowledge  about  correct  behavior  is  used  then  almost  any  discrepancy 
can  be  accounted  for  by  the  failure  of  any  component  [Hamscher84].  Sub¬ 
sequent  reasoning  with  fault  models  can  constrain  the  possibilities,  but  it 
is  inefficient  to  go  through  the  intermediate  stage  of  generating  all  possible 
candidates,  and  the  interleaving  avoids  it. 

The  program  described  in  this  report  is  based  on  the  first  of  the  above 
techniques,  as  implemented  in  GDE  [deKleer87].  This  approach  begins  with 
an  augmentation  of  behavior  prediction.  Each  local  prediction  is  tagged 
with  the  set  of  components  on  whose  correct  behavior  it  depends,  so  that 
when  an  observation  is  made  that  contradicts  what  the  model  predicted,  the 
components  responsible  can  be  easily  found.  Each  of  these  predictions  are 
only  valid  if  one  or  both  adders  are  assumed  to  be  working  normally,  and 
each  prediction  is  tagged  with  the  minimal  sets  of  assumptions  that  support 
it.  For  example,  suppose  both  inputs  to  an  adder  component  Adder- 1  are  2 
(Figure  2.3). 

Neither  input  to  Adder-1  requires  any  assumptions,  so  their  tags  are  {}. 
The  prediction  that  the  output  X  is  4  relies  on  the  assumption  that  Adder-1 
is  working  normally  along  with  all  assumptions  supporting  the  inputs,  so  it 
is  tagged  with  the  set  {Adder-1}.  Each  such  set  of  assumptions  is  called 
an  environment  The  prediction  that  the  output  Y  is  8  is  tagged  with  the 
environment  containing  the  assumptions  that  Adder-1  and  Adder-2  are  both 
working.  Observations  such  as  those  at  the  inputs  of  Adder-1  are  true  in  the 
empty  environment  since  they  rely  on  no  assumptions. 

Recall  that  the  behavior  model  need  not  only  predict  outputs  fro*  \  inputs, 
but  can  enforce  any  logical  relationship  between  the  values  carried  by  con- 
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Figure  2.3:  Behavior  Prediction  Example 


nections  in  the  device.  Such  predictions  are  tagged  with  sets  of  assumptions 
just  as  before.  For  example,  if  one  input  to  Adder-2  is  4,  and  the  output  is 
6,  then  the  other  input  is  predicted  to  be  2  and  tagged  with  the  assumption 
that  Adder-2  is  working  (Figure  2.4).  Similarly,  if  one  input  to  Adder-1  is  2, 
then  the  other  input  is  deduced  to  be  0  and  that  prediction  is  tagged  with 
the  assumptions  that  Adder-1  and  Adder-2  are  working. 


Figure  2.4:  Reasoning  from  Effects  to  Causes 


Candidate  generation  involves  detecting  discrepancies  and  determining 
which  components  could  have  been  responsible.  Discrepancies  are  inconsis¬ 
tent  predictions  made  under  different  sets  of  assumptions  (that  is,  in  different 
environments).  For  example,  suppose  the  inputs  to  the  two-adder  device  were 
as  in  the  first  case,  but  the  output  was  observed  to  be  6  (Figure  2.5).  Su¬ 
perimposing  the  two  sets  of  predictions,  it  can  be  seen  that  (among  other 
discrepancies)  node  X  is  predicted  to  be  4  if  Adder-1  is  working,  but  2  if 
Adder-2  is  working.  The  union  of  the  environments  that  underly  inconsistent 
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predictions  are  termed  conflicts ,  and  are  denoted  with  angle  brackets  (  ) .  In 
this  case,  {Adder-1,  Adder-2)  is  a  conflict. 


Figure  2.5:  Discrepancies  Produce  Conflicts 


A  conflict  is  a  set  of  assumptions  that  contains  at  least  one  that  must  be 
false.  In  troubleshooting,  the  assumptions  are  about  whether  components 
are  working  properly,  so  it  can  be  thought  of  as  a  set  of  components  that 
cannot  all  be  working  properly.  If  one  of  the  components  in  each  conflict  were 
actually  failing,  it  would  resolve  the  inconsistency.  The  minimal  set  covers 
of  these  conflicts  are  termed  candidates ,  denoted  with  square  brackets  [  ].  By 
Occam’s  razor  only  the  minimal  set  covers  (those  with  no  subsets  that  are 
covers)  are  needed;  the  minimal  covers  are  the  simplest  explanations  for  the 
inconsistency.  Each  candidate  corresponds  to  a  set  of  components  that  would 
resolve  all  the  inconsistencies  if  all  of  them  were  failing.  For  example,  if  there 
is  just  one  conflict  (Adder-1,  Adder-2)  there  are  two  singleton  candidates, 
denoted  [Adder-1]  and  [Adder-2].  The  covering  set  that  includes  both  adders 
is  not  a  candidate,  since  it  is  not  minimal. 

This  scheme  incorporates  the  handling  of  multiple  faults  in  a  natural 
way.  Suppose  we  subsequently  observe  that  X  is  5.  There  would  then  be 
two  conflicts  (Adder-1)  and  (Adder-2),  and  their  minimal  set  cover  would  be 
the  candidate  [Adder-1,  Adder-2],  meaning  that  both  Adder- 1  and  Adder-2 
are  faulty.  In  general,  the  number  of  candidates  can  be  exponential  in  the 
number  of  conflicts.  Consider  for  example  2n  assumptions  and  n  conflicts, 
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one  for  each  pair  of  assumptions  2i  and  2i  + 1;  this  results  in  2n  candidates. 
Exponential  blowup  is  rare  in  practice;  a  more  common  phenomenon  is  that 
along  with  a  small  set  of  single-fault  candidates  there  will  be  a  larger  set 
of  multiple-fault  candidates.  For  example,  the  two  conflicts  (A,  B,  C,  D) 
and  (D,  E,  F,  G)  yield  one  single-fault  candidate  [D]  and  nine  two-fault 
candidates. 

Strictly  speaking,  this  and  other  model-based  schemes  do  not  do  diagno¬ 
sis.  They  detect  differences  between  the  device  and  the  model  and  produce 
candidates  that  indicate  which  components  of  the  model  could  be  modified 
to  account  for  the  observations.  To  interpret  these  differences  as  indications 
that  certain  components  of  the  real  device  are  broken  requires  that  the  model 
has  fidelity,  that  is,  that  the  models  of  components  accurately  represent  their 
correct  behavior.  Because  of  the  practical  impossibility  of  having  models  that 
are  correct  in  every  respect,  it  is  important  to  understand  how  GDE  degrades 
in  the  face  of  failures  whose  effects  are  not  properly  modeled.  The  central 
issue  is  which  interactions  between  components  have  been  modeled;  failures 
that  result  in  the  coupling  of  components  through  unmodeled  interactions 
will  yield  incorrect  candidates. 

For  example,  the  standard  model  of  digital  circuits  says  that  each  node 
is  driven  to  0  or  1  by  just  one  gate.  Using  this  model,  upon  finding  a  dis¬ 
crepancy  at  a  given  node,  only  the  gate  driving  the  node  and  gates  upstream 
from  it  will  appear  in  the  resulting  conflict  (Figure  2.6). 


Figure  2.6:  If  x  is  not  1,  Only  A  Could  be  Broken 


In  reality,  the  gates  also  interact  through  current  flow.  The  gate  B  being 
driven  could  be  failing  in  a  way  that  pulls  down  the  node  x.  The  invert¬ 
ers  are  coupled  via  an  interaction  that  was  not  modeled,  so  the  standard 
digital  model  yields  the  wrong  answer.  Suppose  current  flow  were  modeled, 
so  that  the  node  x  is  1  only  when  both  A  and  B  are  working  (Figure  2.7). 
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Now  both  inverters  will  show  up  as  candidates.  These  candidates  could  be 
disambiguated  by  observing  the  current  flow  into  B. 

Figure  2.7:  Inverter  B  Could  be  Pulling  x  Down 


1  <a.b> 


0  O 


There  are  even  more  candidates,  however.  For  example,  there  could  be  a 
short  between  node  x  and  some  other  node  (Figure  2.8).  Even  if  modeling  all 
the  possible  shorts  were  practical,  there  are  still  further  possible  interactions 
that  this  model  leaves  out. 

Figure  2.8:  A  Short  Could  be  Pulling  x  Down 


y  0  {c> 


An  important  property  of  the  GDE  scheme  for  producing  fault  candidates 
is  the  way  it  degrades  in  the  face  of  failures  that  violate  the  device  model  by 
introducing  unexpected  interactions,  as  in  this  last  example.  With  enough 
observations,  the  results  will  be  interpreted  as  requiring  multiple-fault  ex¬ 
planations.  For  example,  suppose  the  symmetric  test  had  been  run  with  the 
inputs  to  A  and  C  swapped,  and  y  observed  to  be  0  instead  of  1.  A  new 
conflict  (C,  D)  would  have  been  discovered,  and  GDE  would  produce  four 
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candidates:  [A,  C],  [A,  D],  [B,  C],  and  [B,  D].  Among  the  candidates  that 
GDE  produces  will  be  multiple- fault  candidates  involving  the  components 
influenced  by  the  new  connection.  In  fact,  any  failure  can  be  interpreted  as  a 
multiple-fault  failure  no  matter  how  drastic  its  effects,  since  there  is  always 
the  degenerate  candidate  consisting  of  all  the  components  in  the  device.  For 
example,  suppose  that  some  digital  circuit  model  does  not  explicitly  represent 
power  and  ground.  If  power  were  lost  then  every  component  might  appear 
to  be  behaving  incorrectly.  That  is  exactly  what  the  troubleshooting  engine 
would  produce  as  a  candidate:  one  in  which  every  component  is  broken. 

The  GDE  scheme  for  generating  candidates  from  conflicts  is  simple,  gen¬ 
eral,  and  to  the  extent  that  the  model  accurately  represents  the  structure  and 
behavior  of  the  device  it  yields  correct  results.  The  difficulty  is  that  since  the 
model  can  never  be  totally  correct,  only  the  fidelity  of  the  underlying  model 
gives  license  to  interpret  a  candidate  such  as  [A,  B]  as  meaning  that  both 
A  and  B  are  broken.  One  way  of  dealing  with  this  problem  is  illustrated  by 
HT  [Davis84],  in  which  discrepancies  that  can  only  be  explained  by  multiple 
fault  candidates  are  checked  to  see  whether  they  could  be  explained  as  sin¬ 
gle  faults  in  alternative  models  of  the  circuit.  One  such  alternative  model 
makes  the  physical  proximity  of  wires  explicit,  to  detect  shorts  like  that  in 
Figure  2.8. 

Another  difficulty  is  one  shared  by  any  troubleshooting  program,  namely, 
the  available  observations  of  the  device  might  be  too  crude  to  detect  discrep¬ 
ancies.  For  example,  suppose  a  behavior  model  predicts  a  particular  sequence 
of  zeroes  and  ones  will  appear  on  a  wire,  but  an  oscilloscope  can  only  de¬ 
termine  whether  the  signal  is  active  or  not.  Legitimate  discrepancies  and 
conflicts  may  well  go  undiscovered,  and  hence  some  inconsistent  candidates 
may  survive. 

2.2.4  Discrimination 

As  diagnosis  proceeds  there  are  usually  several  candidates  that  could  ex¬ 
plain  all  the  discrepancies.  To  discriminate  between  these  candidates  requires 
gathering  more  information  in  the  form  of  either  (i)  new  observations  of  the 
device  in  its  current  state,  or  (ii)  observations  of  its  response  to  some  new 
test  stimuli.  Since  there  are  typically  many  observations  and  tests  that  could 
be  performed,  the  program  needs  to  choose  which  of  them  to  do  next.  This 
choice  can  be  formulated  in  terms  of  the  cost  of  each  action,  the  benefits 
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of  their  various  outcome*,  and  the  likelihood*  of  those  outcome*.  Using  the 
entropy  of  the  distribution  of  candidates  as  a  “benefit”  metric,  choosing  the 
observation  yielding  the  minimum  expected  entropy  as  in  [Gorry73]  can  be 
used  in  the  model-based  approach  just  as  in  the  symptom-based  approach. 
In  GDE  the  device  model  is  used  to  derive  the  expected  outcomes  of  each 
possible  observation  along  with  their  likelihoods;  the  details  will  be  discussed 
shortly.  A  similar  framework  is  used  in  IN- ATE  [Cantone83]  to  estimate  the 
likelihoods  of  various  circuit  test  outcomes. 

Recall  that  in  GDE  each  candidate  is  a  set  of  assumptions  that  would 
resolve  all  conflicts  if  they  were  all  false.  GDE  assigns  a  weight  to  each  can¬ 
didate  by  treating  each  assumption  as  independent  and  assigning  to  each  a 
prior  probability  near  1.0  of  being  true.  The  probability  of  a  candidate  is 
then  the  probability  that  all  the  assumptions  it  includes  are  false  and  all 
other  assumptions  are  true.  The  weight  of  each  candidate  is  its  probability 
normalized  with  respect  to  all  candidates.  Continuing  the  two-adder  exam¬ 
ple,  let  the  initial  probability  of  each  adder  working  be  p( Adder)  =  .99.  The 
weight  of  each  is  .50,  computed  as  shown: 

Candidate  Probability  Weight 

[Adder-1]  (1  -p(  Adder-1  ))xp(  Adder-2)  =  .0099  .50 
[Adder-2]  p( Adder- 1)  x  (1  -  p( Adder-2))  =  .0099  .50 

Suppose  there  had  been  three  adders  A,  B,  and  C  with  p(A)  =  p(B)  = 
p(C)  =  .99,  and  that  there  were  two  conflicts  (A,  B)  and  (B,  C).  There  would 
be  two  candidates  [B]  and  [A,  C]  whose  rankings  would  be  as  shown  below1. 
This  yields  the  intuitively  satisfying  result  that  the  single- fault  candidate  [B] 
is  much  more  likely  than  the  multiple-fault  candidate  [A,  C]: 

Candidate  Probability  Weight 

[B]  p( A)  x  (1  -  p(B))  x  p(C)  =  .0098  .99 

[A,C]  (1  -  p(A))  x  p(B)  x  (1  -  p(C))  =  .000099  .01 

There  will  nearly  always  be  several  competing  candidates.  To  discrimi¬ 
nate  among  them,  GDE  considers  all  the  possible  observations  that  could  be 

lThe  normalisation  is  a  heuristic  step  that  ignores  non- minimal  candidates.  Both  A 
and  B  could  be  broken,  all  three  could  be  broken,  and  so  forth.  The  residual  probability 
is  distributed  among  these  other  n on-minimal  candidates. 
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made  next,  and  by  a  one-level  lookahead  picks  the  observation  that  is  ex¬ 
pected  to  yield  the  most  information.  The  probability  of  each  outcome  of  a 
possible  observation  is  estimated  as  the  combined  weight  of  those  candidates 
with  which  the  outcome  would  be  consistent. 

In  the  two-adder  example,  according  to  the  predictions  an  observation  at 
X  has  two  outcomes;  either  it  is  4  (if  Adder-1  is  working),  or  it  is  2  (if  Adder-2 
is  working).  An  outcome  of  2  is  consistent  with  the  candidate  [Adder-1], 
and  4  is  consistent  with  the  candidate  [Adder-2].  Each  candidate  weight 
is  .5  so  the  probability  of  each  outcome  is  estimated  as  .5  as  well.  The 
expected  information  gain  from  making  a  given  observation  can  be  estimated 
as  the  negative  of  the  entropy  in  that  distribution  of  outcomes  (the  sum 
of  Pi  log3(pi)  over  the  outcomes  t).  The  observation  that  maximizes  the 
additional  information  is  selected.  In  the  two-adder  example  the  computation 
is  trivial.  The  entropy  is  .5  loga(.5)  +  .5  loga(.5)  =  —1.0  and  the  information 
is  —(—1.0)  =  1.0.  A  probe  anywhere  already  observed  yields  information  of 
0  and  X  is  the  only  signal  not  observed,  so  probing  X  is  obviously  the  right 
choice.  In  less  trivial  examples  this  technique  tends  to  choose  observations 
that,  roughly  speaking,  divide  the  space  of  outstanding  candidate  weights  in 
half. 

Relying  on  a  fixed  set  of  observations  or  tests  is  not  always  practical, 
however.  In  domains  such  as  digital  circuit  diagnosis  it  can  be  more  effective 
to  design  a  test  specifically  to  help  discriminate  between  candidates.  This 
approach  is  taken  by  DART  [Genesereth84],  which  repeatedly  generates  tests 
(using  an  implementation  of  path  sensitization  [Roth67])  until  it  finds  one 
that  will  yield  distinguishable  outcomes  given  different  candidates.  Such  tests 
can  be  generated  more  effectively  if  information  about  candidates  is  used 
while  creating  the  test  [Shirley83].  The  program  discussed  in  this  report 
selects  observations  based  on  the  scheme  in  GDE,  but  neither  selects  nor 
generates  tests. 


2.2.5  Hierarchic  Diagnosis 

Hierarchic  diagnosis  is  usually  viewed  in  terms  of  recursive  descent.  The 
troubleshooting  program  first  isolates  the  fault  to  a  component  at  a  certain 
level  of  detail,  then  proceeds  to  diagnose  the  failure  within  its  substructure, 
until  a  primitive  level  of  detail  is  reached.  Each  level  of  structural  detail 
usually  has  associated  with  it  a  level  of  behavioral  detail  as  well.  Nearly 
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all  model-based  troubleshooting  programs  incorporate  hierarchic  diagnosis 
controlled  in  this  way. 

The  GDE  scheme  can  be  extended  to  do  hierarchic  diagnosis.  For  exam¬ 
ple,  suppose  in  the  two-adder  example  that  X  is  observed  to  be  2,  so  that 
(Adder-1)  is  the  only  conflict  and  hence  [Adder-1]  the  only  candidate.  If  the 
adders  are  not  primitive  components,  but  rather  have  the  substructure  of 
four-bit  ripple-carry  adders  (Figure  2.9),  then  troubleshooting  can  continue 
at  the  structural  level  of  full  adder  slices  and  behavioral  level  of  bits.  Each  of 
the  adder  slices  has  a  “sum”  bit  output  and  a  “carry”  bit  output  that  feeds 
into  the  next  slice. 


Figure  2.9:  Diagnosis  of  Adder-1 


The  model  predicts  that  if  SO  is  working  its  carry  output  will  be  0  .  The 
sum  output  of  Si  is  predicted  to  be  0  if  both  SO  and  Si  are  working.  The 
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carry  output  of  SI  will  be  1  no  matter  what  the  carry-in  from  SO  was,  since 
two  of  its  inputs  are  1  already.  The  sum  output  of  S2  will  be  0  if  both  SI 
and  S2  are  working.  The  observation  that  the  adder  output  is  2  corresponds 
to  observations  that  the  sum  outputs  of  SO  through  S3  are  0,  1,  0,  and  0 
respectively.  These  observations  are  inconsistent  with  the  outputs  at  SI  and 
S2,  producing  two  conflicts  {SO,  Si)  and  (SI,  S2).  These  two  conflicts  yield 
the  single-fault  candidate  [SI]  and  the  two-fault  candidate  [S0,S2]. 

Note  that  hierarchic  diagnosis  can  also  be  worthwhile  even  when  the 
fault  has  not  been  fully  isolated  (technically,  isolation  would  mean  that  the 
assumption  that  the  component  is  working  is  a  singleton  conflict).  In  the 
two-adder  example,  suppose  that  X  has  not  yet  been  observed,  so  that  both 
[Adder-1]  and  [Adder-2]  are  candidates.  Both  adders  are  descended  into, 
revealing  slices  SO  through  S3  in  Adder-1  and  S4  through  S7  in  Adder-2.  Some 
of  the  newly  discovered  conflicts  are  shown  in  Figure  2.10:  (Si,  S2,  S3,  S6,  S7), 
(SO,  SI,  S2,  S5),  (SI,  S2,  S4,  S5),  and  (SO,  SI,  S4,  S5). 

From  these  conflicts  all  of  the  subcomponents  of  Adder-2  can  be  ruled  out 
as  single-fault  candidates,  without  requiring  any  more  observations.  In  fact 
there  is  only  one  singleton  candidate:  [SI].  The  other  minimal  candidates  are 
[SO,  S2],  [S2,  S5],  [S2,  S4],  [S3,  S5],  [S5,  S6],  [S5,  S7],  [SO,  S4,  S7],  [SO,  S4,  S6], 
and  [SO,  S3,  S4]. 

Most  discussions  of  hierarchic  diagnosis  in  model-based  troubleshooting 
programs  present  a  simplified  picture  in  terms  of  isolation  to  a  single  com¬ 
ponent  followed  by  recursive  descent.  As  this  example  suggests,  effective 
diagnosis  of  more  complex  systems  is  likely  to  require  considering  multiple 
levels  of  detail  even  when  there  are  several  candidates,  as  done  in  ABEL 
[Patil81]  and  in  the  program  discussed  in  this  report. 

2.2.6  Summary  of  the  Model-Based  Approach 

Although  differing  in  implementation  technology,  all  model-based  trou¬ 
bleshooting  programs  share  the  same  underlying  organization.  A  device 
model  produces  predictions  about  behavior  and  about  what  ought  to  be 
observed.  A  separate  troubleshooting  engine  then  produces  alternative  diag¬ 
noses  that  each  resolve  all  discrepancies  between  the  model  and  the  actual 
observations. 

The  notion  of  a  “device  model”  is  that  of  a  lumped-element  description 
consisting  of  components  and  connections.  In  committing  to  any  such  repre- 
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Figure  2.10:  Diagnosis  of  Adder  Substructures 


sent&tion  the  program  sacrifices  some  degree  of  coverage,  since  there  will  be 
failures  that  it  will  misdiagnose. 

Behavior  prediction  in  such  a  model  can  (for  the  most  part)  be  done  by 
local  propagation,  that  is,  each  prediction  is  made  on  the  basis  of  information 
local  to  a  single  component.  The  choice  of  level  of  detail  to  represent  behavior 
and  of  the  machinery  that  manipulates  it  both  inevitably  sacrifice  precision 
and  completeness  of  predictions  for  the  sake  of  efficiency.  In  troubleshooting, 
the  effect  is  to  sacrifice  some  degree  of  resolution  since  there  will  be  some 
failures  that  cannot  be  distinguished. 

In  the  GDE  framework,  each  prediction  is  tagged  with  its  set  of  supporting 
environments  —  sets  of  assumptions  about  which  components  are  working. 
Discrepancies  result  in  conflicts  —  sets  of  assumptions  that  contain  at  least 
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one  false  assumption.  Each  covering  set  of  these  conflicts  is  a  possible  diag¬ 
nosis;  by  Occam’s  razor  the  minimal  covers  are  selected  as  candidates.  One  of 
the  important  properties  of  the  scheme  is  that  when  faced  with  a  failure  that 
cannot  be  represented  in  the  model,  it  proposes  multiple-fault  candidates 
rather  than  (say)  declaring  an  irreconcilable  inconsistency. 

There  are  nearly  always  several  different  candidates.  Candidates  are  dis¬ 
criminated  by  assigning  each  a  weight  based  on  its  normalized  prior  prob¬ 
ability,  and  if  there  is  no  clearly  dominant  candidate,  an  observation  with 
the  minimum  entropy  is  selected.  When  further  conflicts  result  from  the 
observation,  some  candidates  are  ruled  out  and  others  become  more  likely. 

Finally,  having  isolated  a  fault  to  a  single  component,  hierarchic  diagnosis 
proceeds  by  descending  into  substructure  of  the  component,  if  any.  The 
additional  information  available  at  lower  levels  of  detail  may  also  be  useful 
for  discriminating  candidates  even  if  the  fault  has  not  been  uniquely  isolated, 
as  illustrated  above. 
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Chapter  3 

Troubleshooting  Scenarios 


This  work  makes  several  claims  about  representing  digital  circuits  for  model 
based  troubleshooting.  The  support  for  these  claims  comes  largely  from  a 
set  of  implemented  examples  of  circuit  structure  and  behavior,  and  from  the 
fact  that  the  troubleshooting  engine  can  successfully  diagnose  faults  using 
those  models.  The  scenarios  have  been  collected  into  this  early  chapter  to 
provide  context  and  motivation  for  subsequent  discussions  of  the  structure 
and  behavior  of  these  circuits.  Indeed,  a  central  theme  of  this  work  is  that 
the  intended  use  of  a  model  impacts  what  gets  mentioned  in  the  model;  this 
chapter  shows  the  reader  that  intended  use. 

The  program  that  does  these  examples  is  organized  into  several  -ubsys- 
tems  (Figure  3.1).  There  is  a  domain  independent  troubleshooting  engine 
XDE  that  extends  the  GDE  approach  so  as  to  use  hierarchic  diagnosis  and 
fault  models.  The  physical  and  functional  organization  of  the  circuits  to  be 
diagnosed  are  represented  in  a  language  called  BASIL.  The  behavior  of  the 
components  in  those  circuits  are  represented  in  a  temporal  constraint  propa¬ 
gation  language  TINT.  All  of  these  are  built  using  JOSHUA  [Rowley87],  which 
provides  implementations  of  data  storage  and  retrieval  along  with  forward 
and  backward  chaining  rules.  BAR-JOSEPH  embodies  the  author’s  exten¬ 
sions  to  JOSHUA,  including  a  simple  inheritance  facility  and  an  assumption- 
based  truth  maintenance  system  based  on  boolean  constraint  propagation 
[McAllester80b]  [deKleer86aj.  Chapters  4  through  7  discuss  XDE,  BASIL, 
and  TINT;  the  underlying  JOSHUA  and  BAR-JOSEPH  implementations  are 
not  discussed  in  detail. 

The  troubleshooting  examples  are  all  taken  from  the  Console  Controller 
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Figure  3.1:  Overall  Troubleshooting  Program  Organization 

XDE  BASIL  TINT 

Troubleshooting  Cireuit  Structure  Circuit  Behavior 


BAR-JOSEPH 
Truth  Maintenance 


JOSHUA 
Rule  Language 


Board  of  the  Symbolics  3600  series  console.  The  board  has  approximately 
50  chips  and  300  visible  circuit  nodes;  the  largest  example  currently  handled 
involves  20  chips  and  100  visible  nodes.  In  the  descriptions  of  structure  and 
behavior,  the  following  conventions  are  adhered  to: 

•  U25  is  a  typical  chip  name.  RN7  is  a  typical  name  for  a  nine-resistor 
network  that  is  treated  just  like  a  chip. 

•  nl78  is  a  typical  name  for  a  circuit  node,  or,  to  be  precise,  for  a  wire 
etch  as  represented  by  the  programs. 

•  FD01  is  a  typical  name  for  a  component  such  as  a  Frequency  Divider. 
One-of-a-kind  components  are  usually  given  one  or  two  letter  names 
such  as  U  (a  microprocessor)  or  R  (the  Reset  Hold  Counter). 

•  U30a  and  U30b  are  typical  names  for  the  flipflops  that  reside  on  chip 
U30.  In  general  the  a,  b,  c  suffixes  denote  functional  units  within  a 
chip. 

The  figures  that  show  the  physical  and  functional  organization  of  circuits 
obey  the  following  conventions: 
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•  A  box  with  thin  lines  indicates  the  boundaries  of  a  physical  component, 
usually  a  chip. 

•  A  box  with  thick  lines  indicates  a  functional  component  such  as  a  flip- 
flop,  which  may  have  a  complex  correspondence  to  a  physical  compo¬ 
nent. 

i 

•  Where  a  box  name  such  as  U  is  not  sufficiently  informative,  the  type 
of  the  box  is  shown  in  a  slanted  font  as  Input  Processor. 

•  Thick  lines  with  arrowheads  indicate  connections  between  components; 
technically  they  are  “signals”  as  defined  in  Chapter  5. 

The  examples  summarize  the  output  transcripts  found  in  Appendix  A.l 
through  Appendix  A.ll: 

•  One  example  involving  three  chips  in  the  section  of  the  board  respon¬ 
sible  for  generating  clocks  at  various  frequencies. 

•  Four  examples  involving  ten  chips  in  the  Audio  Decoder  section,  re¬ 
sponsible  for  translating  an  asynchronous  digital  audio  signal  from  the 
host  into  a  signal  that  drives  a  speaker. 

•  Two  examples  involving  twenty  chips  in  the  Input  Encoder  section, 
responsible  for  transmitting  keystrokes  and  mouse  motions  to  the  host. 


3.1  Clock  Generator  Examples 

The  Clock  Generator  circuit  shown  in  Figure  3.2  and  Figure  3.3  is  respon¬ 
sible  for  generating  10  Mhz,  5  Mhz,  and  2.5  Mhz  clock  signals  that  will 
be  distributed  throughout  the  board.  It  is  a  trivial  circuit,  of  course,  but 
nevertheless  raises  important  issues. 

The  generator  consists  of  a  crystal  oscillator  OSC  that  produces  a  10  Mhz 
TTL  clock.  The  inverter  in  this  circuit  is  acting  as  a  buffer  (FB01);  the 
frequency  of  its  output  is  the  same  as  its  input.  Two  separate  frequency 
dividers  FD01  and  FD02  are  implemented  with  the  dual  flipfiops  on  chip 
U30;  if  all  the  components  {U25,U32,U30}  are  working  then  the  output  at 
nl58  is  a  5  Mhz  clock  and  the  output  at  nl67  is  2.5  Mhz. 


3.1.1  Troubleshooting  the  Clock  Generator 

Assume  nl67  is  observed  to  be  “flat,”  that  is,  its  frequency  is  zero.  This  yields 
(U25,U32,U30)  as  a  conflict  and  hence  [U25],  [U25],  and  [U25]  as  candidates. 
Crystal  oscillators,  because  of  their  internal  structure  and  the  way  they  are 
packaged,  tend  to  fail  more  frequently  than  other  components.  Hence  U25  is 
more  likely  to  be  broken  than  U32  or  U30.  A  probe  at  n291  can  be  made  to 
confirm  this. 

Assume  the  signal  at  n291  is  observed  to  have  frequency  10  Mhz,  as  would 
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be  expected  if  the  oscillator  were  behaving  normally.  Oscillators  also  fail  in 
a  characteristic  fashion:  they  produce  a  flat  output  rather  than  the  desired 
periodic  waveform.  Hence  the  oscillator  becomes  a  much  less  likely  suspect, 
though  still  logically  possible  since  the  exact  shape  of  every  pulse  cannot  be 
examined. 

This  leaves  U30  and  U32  as  the  main  suspects  and  node  n205  as  the  next 
good  place  to  probe,  because  that  would  tell  which  chip  needs  replacing. 
If  that  signal  is  probed  and  observed  to  be  flat  (zero  frequency)  it  is  rela¬ 
tively  certain  that  U32  is  broken,  since  otherwise  the  signal  would  have  had 
frequency  10  Mhz. 

3.1.2  Morals  of  the  Clock  Generator  Example 

Simple  as  it  is,  the  Clock  Generator  example  illustrates  three  key  ideas: 

Temporally  coarse  behavior  models  can  be  adequate  for  troubleshooting. 
Although  the  Clock  Generator  is  a  digital  circuit,  the  traditional  model  of 
digital  behavior  that  involves  individual  clock  cycles,  rising  and  falling  edges, 
and  so  forth,  was  inappropriate  for  this  troubleshooting  example.  A  much 
simpler,  temporally  coarse  description  of  its  behavior  involving  the  notion  of 
“frequency”  provided  just  as  much  ability  to  localize  the  fault.  The  detailed 
model  would  have  uselessly  predicted  many  events  individually  undetectable 
using  an  observation  technology  as  simple  as  an  oscilloscope.  That  abstrac¬ 
tions  simplify  reasoning  is  obvious,  what  is  important  is  that  in  this  case 
the  nature  of  the  abstraction  was  explicitly  temporal  and  made  traditional 
simulation  unnecessary. 

The  representation  of  physical  organization  is  essential  for  troubleshoot¬ 
ing.  Failures  and  repairs  occur  in  physical  devices,  not  in  the  functional 
organization  that  we  attribute  to  them,  hence  the  physical  organization  of 
the  device  needs  to  be  represented  explicitly.  The  value  of  representing  the 
physical  organization  has  previously  been  associated  with  the  diagnosis  of 
unusual  faults  such  as  solder  bridges  [Davis84];  in  fact  the  model  should  al¬ 
ways  include  the  physical  information,  for  the  more  mundane  reason  that 
it  can  save  the  troubleshooter  from  spending  effort  on  distinguishing  faults 
that  share  the  same  repair.  In  this  example,  there  was  no  need  to  distinguish 
which  of  two  flipflops  might  have  been  broken,  since  the  repair  in  both  cases 
was  identical:  replace  chip  U30. 
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Fault  models  are  useful  heuristics.  There  wu  added  focusing  power  avail¬ 
able  from  heuristic  knowledge  about  relative  failure  rates  of  components  and 
likely  misbehaviors.  In  this  example,  without  knowing  that  oscillators  com¬ 
monly  fail  in  a  particular  way,  the  observation  that  the  oscillator  output  had 
frequency  10  Mhz  would  have  told  us  nothing  at  all  about  the  oscillator.  This 
kind  of  knowledge  can  only  be  used  to  discount  possible  diagnoses,  never  to 
support  them  directly.  The  added  knowledge  discounts  the  possibility  that 
the  oscillator  was  broken  and  hence  promotes  the  more  likely  diagnoses.  Con¬ 
versely,  had  n291  been  flat  instead  of  10  Mhz,  the  conflict  (U25)  would  have 
resulted  and  the  oscillator  been  identified  as  faulty  after  only  one  probe. 


3.2  Audio  Decoder  Examples 

The  Audio  Decoder  is  responsible  for  converting  an  asynchronous  serially 
encoded  12-bit  digital  signal  into  a  voltage  in  the  range  +15  to  -15  volts.  It 
involves  ten  chips  and  fifty  visible  circuit  nodes  (Figure  3.4).  The  simplifica¬ 
tions  made  for  presentation  are  that  explicit  information  about  wire  etches, 
and  an  alternate  signal  path  into  the  analog-to-digital  converter,  have  been 
omitted. 

The  four  troubleshooting  examples  illustrate  t^e  following  ideas: 

The  behavior  of  components  should  be  represented  in  terms  of  features  that 
are  easy  for  the  troubleshooter  to  observe.  The  vocabulary  of  observations 
that  the  troubleshooter  can  make  provides  a  vocabulary  that  can  be  used 
in  modeling  the  behavior  of  the  device.  For  example,  if  one  assumes  that 
only  certain  features  of  a  signal  can  be  observed  using  an  oscilloscope,  then 
that  set  of  features  defines  one  level  of  abstraction  at  which  to  model  the 
behavior  of  the  device  and  its  components.  This  model  may  not  provide 
sufficient  resolution,  so  a  more  detailed  model  may  be  needed  as  well;  the 
point  is  that  the  vocabulary  of  observations  provides  guidance  as  to  what 
abstractions  may  be  useful. 

Components  in  the  representation  of  the  functional  organization  of  the 
circuit  should  facilitate  behavioral  abstraction..  Representing  the  organiza¬ 
tion  of  a  device  hierarchically  has  advantages  noted  earlier.  Hierarchic  or¬ 
ganization  by  itself,  however,  provides  no  leverage  on  the  fundamental  goal 
of  troubleshooting  —  to  discriminate  between  candidates  —  unless  there  is 
a  behavioral  characterization  of  each  component  that  would  be  difficult  or 


expensive  to  derive  from  its  subcomponents. 

3.2.1  Functional  Organization  of  the  Audio  Decoder 

Figure  3.5  shows  the  three  stages  of  the  Audio  Decoder:  a  clock  is  extracted 
from  the  incoming  asynchronous  m&nchester  signal  by  MTS01;  the  resulting 
clocked  serial  signal  is  converted  into  a  12-bit  parallel  signal  with  a  write 
strobe  by  STP01;  and  the  parallel  signal  is  then  converted  to  a  voltage  by  the 
digital-to-analog  converter  PTA01.  STP01,  which  converts  from  synchronous 
serial  to  parallel  data,  has  three  components:  CSA01  accumulates  the  data 
bits  in  two  shift  registers,  while  a  pair  of  counters  in  CS601  count  the  number 
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of  bits  since  the  first  arrived.  When  all  the  bits  have  arrived  CSB01  asserts 
n290  to  latch  the  parallel  data  into  the  digital-to-analog  converter.  BUF01 
buffers  the  serial  clock  n34  extracted  from  the  incoming  signal  and  strobes 
MTS01  using  n232. 

Most  of  these  functional  components  can  be  viewed  as  simply  converting 
information  from  one  encoding  to  another.  In  particular,  the  signals  denoted 
serOl  and  parOl  both  carry  streams  of  12-bit  digital  values;  only  their  under¬ 
lying  encoding  is  different.  Hence,  MTS01,  STP01  and  PTA01  are  modeled 
abstractly  as  buffers.  The  burst  detector  CSB01  converts  incoming  12-bit 
bytes  into  single  pulses  on  its  output.  The  “clocked  serial  accumulators” 
(CSA01,  CSA02,  CSA03)  are  shift  registers  that  accumulate  the  incoming  se¬ 
rial  data  bits  in  each  burst.  The  individual  data  values  are  not  represented 
explicitly.  Rather  there  are  abstract  signals  which,  although  in  principle 
could  be  computed  at  every  point  in  time,  in  fact  are  only  observed  and 
reasoned  about  in  terms  of  features  such  as  their  amplitudes  and  rates  of 
change. 

Each  of  the  signals  shown  is  described  using  features  that  an  oscilloscope 
can  easily  detect.  An  oscilloscope  can  be  used  to  measure  the  frequencies 
and  periods  of  signals.  For  nonperiodic  signals,  this  can  only  be  done  qual¬ 
itatively:  a  signal  which  is  neither  constantly  high  nor  constantly  low  has 
some  unspecified  positive  frequency,  and  is  characterized  as  “changing.”  For 
periodic  signals,  certain  shape  properties  can  be  observed:  in  particular,  the 
difference  between  the  maximum  and  minimum  value,  the  period  of  cross¬ 
ings  of  its  midpoint  value,  and  the  frequency  of  crossings  of  zero  in  the  first 
derivative  (that  is,  changes  of  direction).  In  these  troubleshooting  examples 
the  Audio  Decoder  is  presented  with  a  1  Khz  sinusoidal  signal1.  The  sam¬ 
pling  rate  is  forty  per  period,  that  is,  a  new  12-bit  quantity  arrives  every  25 
/xsec.  The  resulting  digitally -encoded  sinusoid  is  shown  at  the  top  of  Fig¬ 
ure  3.6.  If  this  sinusoidal  signal  has  higher  order  harmonic  components,  it  is 
simply  characterized  as  having  a  higher  frequency  in  the  first  derivative  than 
in  the  signal  itself.  The  bottom  of  Figure  3.6  shows  an  example,  a  distorted 
sinusoidal  voltage  signal  in  which  the  frequency  of  sign  changes  in  the  first 
derivative  is  higher  than  the  frequency  of  sign  changes  in  the  voltage. 

1  It  i»  assumed  that  the  lKhs  signal  is  the  only  test  input  available.  It  turns  out,  in 
fact,  that  other  test  inputs  would  not  provide  appreciably  more  diagnostic  resolution  given 
the  observability  constraints  already  assumed,  so  that  in  itself  is  not  a  major  handicap. 
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Figure  3.6:  Signal  with  Too  Many  Zero  Crossings  in  its  First  Derivative 
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The  limited  ability  of  the  oscilloscope  to  characterize  the  voltage  output 
of  the  Audio  Decoder  means  that  the  signal  parOl  need  only  be  character¬ 
ized  in  the  vocabulary  of  the  oscilloscope.  Only  the  frequency  of  the  signal, 
crossings  of  its  midpoint  value,  and  zero  crossings  in  the  first  derivative, 
need  be  mentioned.  Since  the  information  carried  by  parOl  is  encoded  as  a 
twelve-bit  digital  signal  and  cannot  be  directly  observed,  it  is  necessary  to 
characterize  the  relationship  between  parOl  and  the  underlying  signals  that 
can  be  observed,  namely  the  individual  data  bits  and  write  strobes.  To  take 
two  representative  examples  of  this  relationship:  if  the  signal  parOl  crosses 
its  midpoint  value  with  a  frequency  of  n,  then  the  most  significant  data  bit 
has  frequency  of  at  least  n  because  it  has  to  change  its  value  at  least  as  often 
as  parOl  does;  similarly,  if  the  write  strobe  signal  is  always  high,  then  the 
signal  parOl  never  changes,  so  that  its  frequency  is  zero  and  the  difference 
between  its  maximum  and  minimum  is  zero. 

The  accumulators  CSA01,  CSA02,  and  CSA03  all  act  as  delay  elements: 
each  incoming  data  bit  appears  some  time  later  at  each  of  the  output  bits 
of  the  shift  registers.  Hence  given  sufficiently  many  bytes  transmitted,  the 
frequency  of  each  individual  bit  of  the  output  signal  should  be  the  same  as 
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that  of  the  incoming  serial  data  measured  with  respect  to  the  serial  dock.  To 
see  why  this  is  so,  consider  an  8-bit  shift  register  that  has  an  incoming  signal 
clocked  into  its  most  significant  bit.  Suppose  that  input  signal  goes  from  0 
to  1  and  back  1000  times  during  a  certain  time  interval.  The  most  significant 
output  bit  will  change  either  999  or  1000  times,  the  next-to-most  significant 
output  bit  will  change  between  998  and  1000,  and  so  forth.  For  a  sufficiently 
large  number  of  these  cycles,  the  number  of  changes  over  that  time  interval 
are  essentially  equivalent,  hence  their  frequencies  are  equivalent.  This  is  an 
example  of  representing  the  behavior  of  components  in  terms  of  features  that 
are  easy  for  the  troubleshooter  to  observe,  in  this  case,  in  terms  of  whether 
or  not  the  signals  are  changing. 

The  subcomponents  CSA02  and  CSA03  of  CSA01  are  almost  identical  to 
CSA01,  except  that  CSA02  corresponds  to  chip  U21,  which  holds  the  7  most 
significant  bits,  and  CSA03  to  U44,  which  holds  the  5  least  significant. 

The  burst  detector  CSB01  is  responsible  for  generating  the  strobe  signal 
that  latches  data  into  the  digital-to-analog  converter.  The  clocked-serial 
input  can  be  characterized  as  a  sequence  of  bursts  of  activity  interspersed 
with  periods  of  quiescence.  Internally  CSB01  is  a  counter  that  is  reset  at 
the  beginning  of  each  burst  and  counts  up  the  number  of  clock  cycles  that 
are  seen,  finally  asserting  its  output  briefly  when  all  twelve  data  bits  have 
been  accumulated  by  CSA01.  This  output  is  then  used  as  a  strobe  for  the 
parallel  data.  Thus,  given  a  sequence  of  incoming  data  words,  CSB01  asserts 
its  output  once  per  word.  The  behavior  of  CSB01  is  described  in  terms  of 
frequencies;  the  output  frequency  is  positive  only  when  the  input  frequency 
is  positive. 

CSB01  is  a  good  example  of  how  explicit  knowledge  about  functional 
organization  simplifies  troubleshooting.  Simulating  the  behaviors  of  the  in¬ 
dividual  components  —  the  two  counters,  the  gates,  and  the  pullup  resistors 
—  would  be  relatively  tedious.  Encapsulating  them  along  with  the  feedback 
signal  yields  an  aggregate  behavior  that  is  almost  as  easy  to  describe  and 
simulate  as  that  of  just  one  counter.  Furthermore,  it  lends  itself  to  descrip¬ 
tion  in  terms  of  frequencies  and  rates  of  change,  features  that  are  easier  to 
observe  than  the  individual  counting  steps. 
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3.2.2  Physical  Organization  of  the  Audio  Decoder 

The  Audio  Decoder  i«  implemented  using  nine  chips  and  a  nine-resistor  net¬ 
work  that  is  treated  by  the  program  as  an  ordinary  chip.  The  correspondence 
between  the  functional  components  and  the  physical  chips  is  shown  in  Fig¬ 
ure  3.7.  The  serial  serOl  signal  is  carried  by  a  clock  and  a  data  signal  (n56 
and  n260,  respectively).  The  parallel  signal  parOl  corresponds  to  two  control 
signals  n290  and  n232  and  twelve  bits  of  data,  named  (from  most  to  least 
significant)  n48,  n289,  n246,  n88,  n208,  nl39,  nl31,  nll2,  nl94,  nl59,  nll7 
and  n236. 

The  likelihood  of  failure  for  each  chip  is  estimated  from  its  physical  com¬ 
plexity  as  measured  by  the  count  of  pins.  The  probability  that  chip  is  normal 
is  simply  the  probability  that  all  its  pins  are  normal.  Wires  are  assumed  not 
to  fail. 

There  are  130  pins  in  the  audio  circuit;  in  principle  the  program  can 
suggest  probing  any  of  them.  However,  since  etches  are  assumed  not  to  fail, 
there  is  no  need  to  probe  more  than  one  pin  attached  to  any  given  etch, 
nor  is  there  any  need  to  probe  pins  that  are  attached  directly  to  power  or 
ground.  Hence  in  this  example  there  are  only  23  distinguishable  probes  that 
XDE  will  ever  suggest. 

3.2.3  Audio  Decoder  Example  I 

Suppose  that  the  output  of  PTA01  is  observed  to  be  flat,  that  is,  zero  fre¬ 
quency  and  amplitude.  Any  of  the  ten  chips  could  be  responsible,  so  there 
are  ten  singleton  candidates,  one  corresponding  to  each  chip.  The  candidate 
[U43]  (the  digital-to-analog  converter)  is  judged  to  be  somewhat  likelier  than 
the  others. 

The  model  makes  predictions  about  which  of  the  signals  in  the  circuit 
should  have  a  constant  1  value  (nl40,  for  example),  which  should  have  a 
constant  0  value  (n34,  for  example,  which  is  0  except  during  certain  local 
keyboard  operations),  and  which  should  be  changing.  The  program  suggests 
a  number  of  signals  that  could  be  examined,  shown  below: 
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Place 

Expected 

Entropy 

Supporting  Environments 

n290 

changing 

.83 

{RN6,U12,U10,U11,U20} 

n280 

changing 

.76 

{RN6,U12,U10,U20> 

nll2 

changing 

.73 

{RN6,U12,U21,U44} 

n88 

•  •  • 

changing 

•  •  • 

.60 

•  •  • 

{RN6,U12,U21} 

•  •  • 

The  highest  ranked  probe  is  of  n290,  one  of  the  write  strobes  to  PTA01. 
This  makes  sense,  since  if  this  signal  were  dead  it  would  explain  why  the  out¬ 
put  was  flat  and  would  tend  to  exonerate  the  shift  registers  that  accumulate 
the  incoming  data  bits.  Suppose  n290  is  observed  to  be  a  constant  1  instead 
of  changing  as  expected.  Since  it  was  supposed  to  be  changing  as  long  as 
the  components  {RN6,  U12,  U10,  Ull,  U20}  are  all  working  properly,  (RN6, 
U12,  U10,  Ull,  U20)  is  a  conflict.  There  are  now  five  candidates,  one  corre¬ 
sponding  to  each  chip.  [U12]  is  slightly  likelier  than  the  other  candidates. 

Now  a  different  group  of  probes  are  ranked  the  highest.  All  of  the  signals 
on  the  data  bus  are  equally  good;  if  both  of  the  candidates  [RN6]  and  [U12] 
are  working  then  these  signals  should  be  changing: 


Place 

Expected 

Entropy 

Supporting  Environments 

n88 

changing 

.72 

{RN6,U12,U21} 

nll2 

changing 

.72 

{RN6,U12,U21,U44} 

n48 

changing 

.72 

{RN6,U12,U21} 

nl59 

•  •  • 

changing 

•  »  • 

.72 

•  •  • 

{RN6,U12,U21,U44} 

•  •  • 

Suppose  the  data  bit  n88  is  observed  to  have  the  constant  value  1  instead 
of  changing.  Now  (RN6,U12,U21)  is  a  new  conflict.  There  are  two  singleton 
candidates,  [RN6]  and  [U12],  and  three  two-component  candidates:  [U10, 
U21],  [Ull,  U21]  and  [U20,  U21].  The  singleton  candidates  are  judged  to 
be  the  most  likely,  and  U12  more  likely  to  fail  than  RN6.  The  probes  given 
the  highest  ranking  are  those  of  signals  that  are  expected  to  be  changing 
independently  of  whether  RN6  is  working  or  not,  namely  n56  and  n232: 

Place  Expected  Entropy  Supporting  Environments 
n56  changing  .92  {U12} 

n232  changing  .92  {U12,U22} 
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Signal  n56  is  observed  to  have  the  constant  value  1,  so  (U12)  is  a  conflict 
and  hence  U12  is  the  only  singleton  candidate. 

The  troubleshooting  program  performs  well  onthis  example;  only  three 
probes  in  addition  to  the  initial  symptom  were  needed  to  yield  a  single 
candidate  with  much  higher  probability  than  the  others  (transcript  in  Ap¬ 
pendix  A.2).  More  important,  it  was  able  to  do  so  using  only  temporally 
coarse  predictions  about  the  signals  in  the  circuit,  predictions  in  terms  that 
corresponded  directly  to  probes  that  the  troubleshooter  could  make  easily. 

3.2.4  Audio  Decoder  Example  II 

Troubleshooting  a  second  example  with  the  same  initial  symptoms  but  a  dif¬ 
ferent  underlying  fault  yields  poorer  performance.  By  including  information 
about  the  way  components  are  expected  to  fail,  however,  its  performance 
improves  dramatically  (transcripts  in  Appendices  A.4  and  A.5). 

Initially  the  output  is  observed  to  be  flat  and  instead  of  changing  as 
expected,  n290  is  observed  to  be  constant  1.  As  before,  the  five  most  likely 
candidates  are  [RN6],  [U12],  [U10],  [Ull],  and  [U20].  This  time,  however, 
n88  is  observed  to  be  changing,  as  would  be  expected  if  everything  were 
normal.  Given  no  change  in  the  set  of  candidates,  probes  of  other  data  bus 
bits  still  appear  to  be  the  most  informative  probes;  for  example,  the  next  set 
of  suggestions  is  shown  below: 


Place 

Expected 

Entropy 

Supporting  Environments 

n236 

changing 

.72 

{RN6,U12,U21,U44} 

n208 

changing 

.72 

{RN6,U12,U21> 

nll7 

changing 

.72 

{RN6,U12,U21,U44} 

n289 

changing 

.72 

{RN6,U12,U21> 

The  next  six  probes  similarly  yield  no  new  conflicts  and  do  not  change 
the  set  of  candidates.  Finally,  the  program  suggests  probing  signal  n213,  the 
signal  that  was  immediately  upstream  of  the  discrepancy  observed  at  n290. 
If  U 20  is  working  properly,  then  it  should  be  a  constant  0.  It  is  observed  to 
be  changing,  hence  (U20)  is  a  conflict  and  [U20]  the  single  highest  ranked 
candidate. 

The  difficulty  is  that  the  program  just  did  eight  probes,  six  of  which  were 
useless.  The  fact  that  even  one  of  the  bits  of  the  data  bus  was  changing 
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should  have  indicated  that  the  problem  was  unlikely  to  be  in  U12  or  RN6. 
This  is  because  if  either  of  those  components  were  broken  the  entire  bus 
would  probably  be  inactive.  Hence,  the  more  informative  probes  would  have 
been  in  the  vicinity  of  CSB01. 

Including  fault  models  for  the  components  in  MTS01  and  CSB01  changes 
the  efficiency  of  the  troubleshooter  dramatically.  Now,  instead  of  suggest¬ 
ing  probes  of  the  data  bus  bits,  the  higher  ranked  probes  are  those  around 
CSB01,  the  component  responsible  for  producing  the  discrepant  signal  n290. 
In  particular,  n213  is  now  among  the  highest  ranked  probes: 


Place 

Expected 

Entropy 

Supporting  Environments 

n213 

changing 

1.0 

{RN6,U10,U11,U12,U20} 

0 

{U20> 

n56 

changing 

1.0 

{U12> 

nl59 

changing 

0.79 

{RN6,U12,U21,U44} 

n289 

■  •  • 

changing 

•  •  ■ 

0.79 

•  •  • 

{RN6,U12,U21> 

When  n213  is  observed  to  be  changing,  the  (U20)  is  a  conflict  and  so 
[U20]  becomes  the  single  likeliest  candidate.  Instead  of  making  eight  probes, 
this  time  the  program  only  makes  two.  Furthermore,  using  fault  models  as 
heuristics  does  not  decrease  the  performance  of  the  other  troubleshooting 
examples.  The  other  scenarios  shown  require  the  same  number  of  probes 
with  or  without  fault  models. 


3.2.5  Audio  Decoder  Example  III 

Suppose  that  instead  of  the  output  being  simply  flat,  its  amplitude  and  fre¬ 
quency  are  correct,  but  it  is  distorted  as  was  shown  earlier  in  Figure  3.6 
(Page  47).  Using  only  temporally  coarse  descriptions  of  signals,  the  trou¬ 
bleshooting  program  is  able  to  isolate  the  responsible  component  using  six¬ 
teen  probes. 

The  initial  symptom  is  that  the  frequency  of  zero  crossings  in  the  first 
derivative  of  the  output  signal  is  higher  than  expected.  All  components  are 
singleton  candidates,  and  as  in  previous  examples  the  first  probe  is  at  the 
write  strobe  signal  n290.  This  signal  is  expected  to  be  changing,  and  it  is. 
The  next  two  probes  are  at  internal  signals  of  CSB01,  and  appear  to  be 
changing  as  expected. 
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All  but  one  of  the  next  eleven  probe*  are  of  the  data  bua  bit*,  which  are 
all  expected  to  be  changing.  The  temporally  coarse  behavior  model  does  not 
include  enough  detail  to  indicate  which  of  the  bits  ought  to  be  probed  first; 
any  of  the  twelve  bits  having  the  wrong  value  at  the  moment  they  are  latched 
into  PTA01  could  result  in  a  distortion  similar  to  that  described.  Eventually 
the  signal  n246  is  discovered  to  be  stuck  at  1,  yielding  the  conflict  (RN6,U21) 
and  hence  the  likeliest  candidates  as  [U21]  and  [RN6].  After  two  more  probes 
the  conflict  (U21)  is  discovered  and  the  candidate  [U21]  is  left  as  the  final 
diagnosis  (transcript  in  Appendix  A.7). 

What  is  interesting  about  the  performance  of  the  troubleshooting  pro¬ 
gram  using  the  temporally  coarse  model  is  not  the  probes  it  did,  but  the 
probes  it  did  not  do.  The  serial  signals  n56  and  n260,  for  example,  were  not 
probed,  and  this  makes  sense:  if  there  were  faults  there  or  upstream  of  there, 
the  effects  would  probably  have  been  more  drastic  than  mere  distortion  of 
the  output.  Sixteen  probes  may  seem  like  a  lot,  but  it  would  require  a  consid¬ 
erably  more  temporally  detailed  (and  expensive)  model  to  do  much  better. 
To  determine  without  probing  exactly  which  data  bus  bits  were  wrong,  for 
example,  would  have  required  being  able  to  observe  the  shape  of  the  output 
to  twelve  bits  of  precision  and  at  just  those  moments  when  the  write  strobe 
signals  were  asserted.  While  this  is  not  impossible,  human  troubleshooters 
rarely  go  to  that  kind  of  trouble,  preferring  instead  to  do  a  few  more  simple 
probes. 

3.2.6  Audio  Decoder  Example  IV 

Like  any  abstraction,  temporally  coarse  behavior  models  discard  information. 
The  final  Audio  Decoder  troubleshooting  example  is  similar  to  the  previous 
example,  illustrating  that  temporally  coarse  models  discard  information  that 
could  potentially  have  been  used  to  improve  the  choice  of  probes. 

The  initial  symptom  is  that  the  amplitude  of  the  audio  output  signal  is 
correct,  but  the  frequency  of  its  zero  crossings  is  much  higher  than  expected. 
Figure  3.8  shows  the  expected  signal  and  that  observed. 

The  initial  probe  of  the  write  strobe  n290  reveals  that  the  signal  is  chang¬ 
ing,  just  as  expected.  The  subsequent  probe  of  n280,  a  signal  inside  CS801, 
however,  reveals  that  it  is  a  constant  1  instead  of  changing,  as  expected.  This 
produces  the  conflict  (RN6,  U12,  U22,  U10,  Ull),  and  those  five  components 
axe  the  top  candidates. 
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Figute  3.8:  Signal  with  Too  Many  Zero  Crossings 


The  observation  that  n280  is  1  triggers  some  new  predictions  (dashed 
arrow  in  Figure  3.9).  n280  is  the  carry-out  signal  of  the  four-bit  counter  Ull. 
Since  that  output  is  0  whenever  the  Load  control  input  to  the  counter  is  0, 
the  model  concludes  that  if  the  counter  is  working  normally,  then  the  load 
control  input  nlOl  must  have  been  a  constant  1. 

After  seven  probes  elsewhere  in  the  circuit,  the  program  suggests  probing 
nlOl.  It  is  observed  to  be  changing,  hence  the  counter  Ull  cannot  be  working 
normally.  Hence  (Ull)  is  a  singleton  conflict  and  [Ull]  is  the  single  highest 
ranked  candidate. 

The  program  reached  a  diagnosis  with  eleven  probes.  As  in  the  previ¬ 
ous  example,  this  may  seem  like  a  lot,  but  it  would  require  a  much  more 
temporally  detailed  model  to  do  better.  For  example,  one  of  these  probes 
is  of  the  data  bus  signal  n289,  which  was  predicted  to  be  changing.  But 
there  is  no  distortion  of  the  data  signals  that  could  account  for  the  observed 
distortion:  the  basic  problem  is  that  the  rate  at  which  the  output  signal  is 
changing  is  higher  than  expected  —  the  data  values  are  getting  strobed  too 
fast.  This  can  only  be  caused  by  a  clock  running  too  fast  or  some  defect 
in  the  burst  detection  counters.  In  fact  that  is  just  what  is  happening:  the 


I 
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3.2.7  Summary  of  the  Audio  Decoder  Examples 

The  Audio  Decoder  circuit  used  in  these  troubleshooting  examples  illustrates 
the  effectiveness  and  limitations  of  temporally  abstract  models  of  circuit  be¬ 
havior.  The  functional  organization  used  in  the  model  explicitly  represents 
relationships  between  the  rates  of  change  on  the  inputs  and  outputs  of  com¬ 
ponents.  These  signal  features  are  easy  for  the  troubleshooter  to  observe, 
and  so  define  an  appropriate  vocabulary  with  which  to  describe  their  be¬ 
havior.  These  temporally  coarse  behavior  descriptions  are  associated  with 
the  functional  organization  of  the  circuit.  For  example,  the  three  chips  U10, 
Ull,  and  U22  not  only  have  their  own  behavior  descriptions,  but  there  is  a 
temporally  coarse  description  of  CSB01,  the  composition  of  all  three.  The 
temporally  coarse  descriptions  are  adequate  for  troubleshooting  many  of  the 
possible  failures,  although  there  are  cases  for  which  a  more  temporally  de¬ 
tailed  model  would  provide  the  same  diagnoses  wit.i  fewer  probes,  and  others 
for  which  the  temporally  coarse  observations  and  models  cannot  provide  a 
unique  diagnosis  no  matter  how  many  probes  are  done. 


3.3  Input  Encoder  Examples 

The  purpose  of  the  Console  Controller  Board  is  to  transmit  keystrokes  — 
both  up-  and  down-  transitions  —  and  mouse  activity  to  the  3600  host  com¬ 
puter.  In  addition,  certain  keystroke  sequences  starting  with  a  down  tran¬ 
sition  on  the  “local”  key  cause  changes  in  local  display  parameters,  such  as 
the  brightness  of  the  screen.  The  section  of  the  board  responsible  for  these 
activities  is  the  Input  Encoder.  In  the  following  sections  the  structure  and 
behavior  of  the  Input  Encoder  will  be  presented  in  more  detail  than  the 
simplified  view  given  in  the  Chapter  1.  The  troubleshooting  examples  that 
involve  it  illustrate  how  temporal  abstractions  drastically  simplify  reasoning 
about  devices  with  sequential  feedback  and  internal  state,  so  much  so  that 
model-based  troubleshooting  can  apply  to  board-scale  digital  circuits. 

3.3.1  Functional  Organization  of  the  Input  Encoder 

The  Input  Encoder  merges  three  streams  of  data  from  the  console  peripherals 
and  encodes  them  in  packets  to  be  sent  the  host  (Figure  3.10).  The  three 
information  streams  are: 
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•  Each  up-  and  down-transition  on  the  key*  of  the  main  keyboard  is 
encoded  as  a  single  packet. 

•  An  auxiliary  numeric  keypad  with  fewer  keys  than  the  main  keyboard 
can  be  attached  that  produces  up-  and  down-transitions,  also  encoded 
as  single  packets. 

•  Each  change  of  mouse  position  or  position  of  its  three  buttons  causes 
a  packet  to  be  sent  to  the  host. 


Figure  3.10:  Input  Encoder  Functional  Organization 
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Transmission  of  packets  is  accomplished  by  the  Input  Processor  (denoted 
U  in  Figure  3.10),  which  polls  the  keyboard,  keypad,  and  mouse,  asserting  its 
interrupt  line  (int)  whenever  a  key  transition  or  mouse  motion  has  occurred. 
When  the  int  signal  is  asserted,  the  Console  Controller  C  will  respond  by 
asserting  the  read  signal  RD  a  few  instructions  later.  If  the  interrupt  response 
time  from  the  Console  Controller  is  small  enough  (a  few  microseconds),  a 
packet  is  correctly  transmitted  from  U  to  C. 

The  Console  Controller  C  interprets  some  keystroke  packet  sequences  as 
local  commands;  for  example,  the  sequence  “Local  key  down,  B  key  down, 
B  key  up,  Local  key  up”  will  increase  the  brightness  of  the  console  screen. 
Other  incoming  packets  are  sent  on  to  the  host. 

In  addition  to  the  power  and  ground  inputs  (not  shown),  the  Input  Pro¬ 
cessor  and  Console  Controller  both  require  a  two-phase  5  Mhz  clock  signal, 
denoted  c5mhz  in  Figure  3.10.  These  clocks  are  produced  by  the  Clock  Gen¬ 
erator  section  described  earlier.  The  components  involved  in  generating  and 
buffering  the  clocks  are  similar  to  those  encountered  earlier: 

•  The  two  phase  clock  generator  TP01  converts  a  single-phase  clock  signal 
into  two  dock  signals  180  degrees  mutually  out  of  phase. 

•  The  frequency  dividers  FD02  and  FD03  convert  an  incoming  signal  with 
frequency  n  into  one  with  frequency  j. 

•  The  frequency  buffers  FB01,  FB02,  and  FB03  produce  output  signals 
with  the  same  frequency  as  their  inputs. 

Finally,  both  U  and  C  also  have  an  active-low  “reset”  input.  When  the 
reset  button  signal  is  asserted  and  the  clock  signal  n257  is  running,  the  Reset 
Hold  Counter  (denoted  R)  asserts  n700  for  at  least  100ms,  which  initializes 
both  the  Input  Processor  and  Console  Controller. 

3.3.2  Physical  Organization  of  the  Input  Encoder 

The  Input  Encoder  implementation  centers  around  an  Intel  8035  micropro¬ 
cessor  [Intel86|.  Communication  with  the  mouse  and  keyboard  are  done 
through  a  dedicated  Intel  8741  microprocessor  with  onboard  erasable  pro¬ 
gram  memory.  The  functional  subcomponents  of  the  Input  Encoder  are  each 
implemented  by  one  or  more  chips  as  shown  in  Figure  3.11  and  Figure  3.12: 
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Figure  3.12:  Input  Encoder  Physic*!  Organization 


•  The  Input  Processor  is  implemented  by  the  Intel  8741  microprocessor 
chip  U34. 

•  The  Console  Controller  is  implemented  by  the  physical  8035  processor 
U33  along  with  its  external  PROM  (U18)  and  the  buffers  for  its  external 
bus  data  and  control  signals,  involving  chips  U7,  U8,  U9,  U13,  and  U24. 

•  The  Reset  Hold  Counter  is  implemented  by  the  14-bit  counter  chip  U14 
and  some  NAND  gates  on  chip  U31. 
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•  The  remaining  function*  are  implemented  by  the  inverter*  and  JK 
flipflops  on  chip*  U32  and  U30. 

3.3.3  Expected  Behavior  of  the  Input  Encoder 

A  simple  teat  of  the  Inpat  Encoder  consist*  of  pressing  and  releasing  the 
Reset  button,  then  rolling  the  moose  around.  The  expected  behavior  of  the 
Input  Encoder  is  as  follows: 

•  While  the  reset  button  is  pressed  the  output  of  the  Reset  Bold  Counter 
is  held  low.  With  the  clock  input  n257  running  at  153Khc,  the  signal 
goes  high  100ms  after  releasing  the  button. 

•  The  low-to-high  transition  on  nl62  causes  the  Input  Processor  -  with 
its  clock  input  running  at  5  Mhs  -  to  go  from  the  “stop”  state,  to  the 
“run”  state  in  which  it  transmits  keyboard  and  mouse  transitions  to 
the  Console  Controller. 

•  The  Iow-to-high  transition  on  nl62  causes  the  Console  Controller  -  with 
its  clock  input  running  at  5  Mhs  -  to  go  from  the  “reset”  state  to  the 
“init”  state  and  then  to  the  “monitor”  state  in  which  it  responds  to 
interrupts  and  transmits  incoming  packets  to  the  host. 

•  Each  inch  of  mouse  motion  causes  the  Input  Processor  to  interrupt 
the  Console  Controller,  and  because  the  Console  Controller  is  in  the 
“monitor”  state  it  is  responding  to  interrupts  and  a  mouse  position 
update  is  sent  to  the  Console  Controller. 

•  The  Console  Controller  sends  the  mouse  position  update  to  the  host. 

Each  of  these  high  level  behaviors  has  implications  for  the  activity  of 
certain  observable  signals.  The  important  ones  for  this  example  are: 

•  The  reset  signal  n700  will  be  low,  then  go  high.  Vice  versa  the  signal 
nI62,  since  u31d  is  an  inverter. 

•  The  active-low  interrupt  output  of  U  will  stay  high  while  the  mouse 
is  still,  and  will  be  rapidly  asserted  and  deasserted  while  the  mouse 


moves. 
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•  The  select  signal  for  the  8741  (n226)  will  remain  high  except  for  a  short 
time  after  U  interrupts.  While  in  any  state  other  than  the  “reset”  state 
the  read  signal  RD  (n81)  and  other  bus  signals  will  have  frequencies 
dependent  on  the  input  clock  rate  of  Console  Controller. 

3.3.4  Finding  a  faulty  Input  Processor 

Suppose  that  upon  rolling  the  mouse  around,  the  mouse  cursor  at  the  host 
does  not  move.  This  is  recorded  as  a  discrepancy  at  the  output  of  E.  The 
model  predicts  that  the  transition  should  have  been  sent  if  all  sixteen  chips 
were  working,  but  since  it  was  not,  the  conflict  (U7,  U8,  U9,  U13,  U14,  U15, 
U16,  U22,  U23,  U24,  U25,  U30,  U31,  U32,  U33,  U34)  results.  There  are 
sixteen  candidates  above  threshold,  the  top  few  of  which  are  shown  below 
(transcript  in  Appendix  A.10).  The  notation  U25opea  means  that  one  of  the 
known  fault  modes  for  U25,  called  “Open,”  is  consistent  with  the  observations 
so  far;  [U25op«]  comes  out  on  top  because  that  failure  of  U25  is  likelier  than 
other  any  other  chip  failure,  as  discussed  in  the  Clock  Generator  scenario: 


Weight 

Candidate 

Note 

mm 

Oscillator  chip 

0.102 

[U33] 

8035  Microprocessor 

0.102 

[U34] 

8741  Microprocessor 

0.072 

•  •  • 

[U16] 

•  •  • 

PROM 

Among  many  predictions  made  by  the  model,  the  following  ones  are  about 
observable  signals.  In  this  example,  the  frequencies  of  single-  and  two-phase 
clocks  are  taken  to  be  observable. 
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Node 

Signal 

Expected 

Support 

nl78 

Interrupt 

©  © 

{U25,U32,U30,U26, 

U31,U14,RN6,RN7} 

n226 

U  Select 

changing 

{U25,U32,U30,U26,U31, 

U14,U31  ,U33,RN6,RN7> 

nl62 

Reset 

hi-lo-hi 

{U25,U32,U30,U26,U31, 

U14,RN6,RN7} 

n700 

Reset 

lo-hi-lo 

{U25,U32,U30,U26,U31, 

U14,RN6,RN7} 

nl37 

Write 

27  Khs 

{U25,U32,U30,U25,U31, 

U14,RN6,RN7,U33} 

n257 

153Khs 

153  Khs 

{U25,U32,U30,U26,RN6,RN7> 

c5mhz 

Clock 

5  Mhx 

{U25,U32,U30,RN6,RN7} 

c5mhzi 

U  Clock 

5  Mhz 

{U25,U32,U30,RN6,RN7} 

c5mhzh 

«  *  • 

C  Clock 

•  •  • 

5  Mhz 

•  •  * 

{U25,U32,U30,RN6,RN7} 

•  •  • 

XDE  suggests  nl78, 

the  interrupt  line,  as  the  most  informative  probe 

-  more  specifically,  it  suggests  that  the  signal  he  probed  to  see  whether  it 
changes  while  the  mouse  is  being  moved. 

This  probe  selection  is  the  single  most  interesting  inference  in  this  ex¬ 
ample,  and  it  is  important  to  understand  why  it  was  made.  In  a  purely 
mechanistic  sense,  XDE  suggested  the  interrupt  line  because  if  a  discrepancy 
were  observed  there,  the  conflict  (U25,  U32,  U30,  U26,  U31,  U14,  U31,  RN6, 
RN7)  would  result,  thereby  (approximately)  halving  the  candidate  set.  From 
a  modeling  point  of  view,  the  interesting  point  is  that  a  crude,  temporally 
abstract  model  of  the  behavior  of  the  Input  Processor  is  adequate  to  infer 
that  so  long  as  U  is  working  properly,  has  power  and  clock  inputs,  and  is 
not  being  reset,  that  motions  of  the  mouse  will  activate  the  interrupt  line. 
Similarly,  if  keys  were  being  pressed,  again  the  interrupt  line  would  be  active. 
Abstracting  the  8741  microprocessor  to  a  two-state  device  makes  prediction 
of  its  behavior  in  this  example  much  simpler  than  doing  instruction-level 
simulation,  and  still  provides  predictions  that  are  diagnostically  useful. 

Returning  to  the  example,  suppose  nl78  is  observed  to  be  a  constant  1. 
This  yields  (U25,  U32,  U30,  U26,  U31,  U14,  U31,  RN6,  RN7)  as  a  conflict, 
and  the  top  four  candidates  are  as  shown  below.  [U25opea]  comes  out  on  top 
as  before: 
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Weight 


Candidate 

[U25op«m] 

[U34] 

[014] 

[026] 


Note 


0.280 

0.212 

0.085 

0.085 


Oscillator  chip 

8741  Microprocessor 

14-bit  counter  in  Reset  Hold  Counter 

4-bit  counter  in  FD03 


This  yields  a  new  set  of  predictions  from  among  which  XDE  will  select 
the  next  probe. 


Node  Signal 


nl62 

Reset 

n700 

Reset 

n257 

153Khs 

c5mhz 

Clock 

c5mhzl 

U  Clock 

c5mhzh 

C  Clock 

•  •  •  •  •  • 


Expected  Support 

hi-lo-hi  {U25,U32,U30,U26,U31,U14,RN6,RN7} 

lo  {U34,U25,U32,U30,RN6,RN7} 

lo-hi-lo  {U25,U32,U30,U26,U31,U14,RN6,RN7} 

hi  {U34,U25,U32,U30,RN6,RN7} 

153  Kh*  {U25,U32,U30,U26,RN6,RN7> 

0  H*  {U25Op«,U26,U30,U32,RN6,RN7} 

5  Mh*  {U25,U32,U30,RN6,RN7} 

0  Ha  {U25Op«,U32,U30,RN6,RN7} 

5  Mhz  {U25,U32,U30,RN6,RN7> 

0  Ha  {U25Op«,U32,U30,RN6,RN7} 

5  Mha  {U25,U32,U30,RN6,RN7} 

0  Ha  {U25Op«,U32,U30,RN6,RN7} 


Note  that  node  nl62  now  has  two  conflicting  predictions  for  its  behavior 
—  the  normal  behavior,  and  the  misbehavior  that  it  is  low  at  moments  when 
it  was  expected  to  be  high.  The  argument  for  the  latter  behavior  is  as  follows. 
If  U34  is  working  properly,  U  has  a  clock  and  incoming  mouse  motions.  But 
since  the  int  output  was  not  asserted,  then  it  must  have  been  because  U  was 
in  the  “reset”  state.  Hence  the  reset  input  nl62  must  be  asserted  (low). 
This  is  the  second  most  interesting  inference  in  this  example,  and  again,  it  is 
effective  because  the  Input  Processor  has  been  reduced  to  a  two-state  device: 
only  when  the  component  models  are  so  simple  is  it  reasonable  for  the  system 
to  make  inferences  about  component  inputs  from  knowing  their  outputs. 

The  highest  ranked  probe  is  the  input  clock  to  Reset  Hold  Counter,  n257, 
which  is  expected  to  have  a  frequency  of  153  Khz  if  {U25,  U32,  U30,  U26, 
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RN6,  RN7}  axe  all  working.  Probing  this  signal,  it  is  discovered  to  have  the 
correct  frequency. 

This  observation  has  two  major  consequences.  First,  it  makes  the  like¬ 
liest  candidate  [U25o»«b]  inconsistent.  Second,  although  it  makes  no  new 
predictions,  it  does  add  new  support  to  some  predictions  already  present. 
For  example,  it  was  already  believed  that  nl67  had  frequency  2.5  Mhz  if 
{U25,  U30,  RN6,  RN7}  were  working;  it  can  now  be  deduced  that  it  has 
frequency  2.5  Mhz  if  {U26,  RN6,  RN7}  are  working.  Similarly  c5mhz  has 
frequency  5  Mhz  if  {U26,  U30,  RN6,  RN7}  are  working,  and  so  on.  These 
inferences  result  in  a  new  conflict,  (U26,  U30,  U32,  U34,  RN6,  RN7),  so  that 
the  resulting  highest  ranked  candidates  are; 


Weight 

Candidate 

Note 

0.332 

[U34] 

8741  Microprocessor 

0.132 

[U30] 

Frequency  dividers 

0.132 

[U14] 

Gates  in  Reset  Hold  Counter 

0.116 

[U32] 

Frequency  buffers 

0.116 

[U31] 

Counter  in  Reset  Hold  Counter 

0.083 

[RN6] 

Pullups 

0.083 

[RN7] 

Pullups 

The  reset  signal  nl62  is  now  the  highest  ranked  probe.  Probing  it  shows 
that  it  is  behaving  normally  -  it  starts  out  high,  then  goes  low  while  the  reset 
button  is  pressed,  then  returns  high  a  short  time  later.  Our  observation 
technology  is  sufficiently  crude  that  it  is  impossible  to  say  exactly  when 
the  line  went  low  -  the  essential  observations  are  that  (i)  it  was  asserted 
long  enough  to  reset  U  and  C,  and  (ii)  it  was  unasserted  while  the  mouse 
was  rolling  around.  Nevertheless,  for  simplicity,  the  observation  that  gets 
recorded  is  that  n700  was  high  and  low  at  just  the  times  expected.  There 
are  now  just  five  candidates: 


Weight 

Candidate 

Note 

0.449 

[U34J 

8741  Microprocessor 

0.184 

[U30] 

Frequency  dividers 

0.162 

[U32] 

Frequency  buffers 

0.118 

[RN6] 

Pullup8 

0.118 

[RN7] 

Pullups 
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Next,  an  observation  is  suggested  at  cSmhz.  Doing  so  reveals  that  it  has 
the  expected  frequency  of  5  Mhz.  After  several  more  corroborating  probes  of 
clock  signals,  new  conflicts  are  discovered  and  candidates  eliminated.  Even¬ 
tually  the  only  remaining  candidates  are: 

Weight  Candidate  Note _ 

0.800  [U34]  8741  Microprocessor 

0.200  [RN7]  Pullups 

A  final  corroborating  probe  at  node  n57  (not  shown  in  Figure  3.10)  results 
in  the  sole  candidate: 

Weight  Candidate  Note 

1.000  [U34]  8741  Microprocessor 

This  example  is  the  same  as  that  presented  in  Chapter  1  and  has  the  same 
moral:  what  is  interesting  about  it  is  the  contrast  between  the  simplicity  of 
the  reasoning  and  the  relatively  few  probes  (eleven,  to  be  exact)  required 
to  isolate  the  fault  to  a  single  chip,  in  spite  of  the  underlying  complexity  of 
the  circuit.  What  made  that  simplicity  possible  was  the  choice  of  behavioral 
abstractions,  in  particular  the  temporally  coarse  behavior  models  for  the  mi¬ 
croprocessors,  which  made  it  possible  to  reason  about  the  reset  and  interrupt 
signals  without  getting  swamped  in  details. 

3.3.5  Finding  a  faulty  Console  Controller 

The  preceding  example  illustrates  the  important  characteristics  of  the  behav¬ 
ior  models  for  the  Input  Encoder  examples.  Another  example  illustrates  how 
the  program  isolates  faults  inside  the  functional  component  C  (transcript  in 
Appendix  A.ll). 

The  initial  inputs  and  symptoms  are  the  same  as  before,  so  the  interrupt 
signal  nl78  is  suggested  as  the  next  probe  point.  This  time,  it  is  observed 
to  be  changing  while  the  mouse  is  rolled  around.  This  suggests  that  it  is 
not  the  Input  Processor  U  that  is  working  normally.  Probing  the  clock  signal 
n257  shows  that  its  frequency  is  normal,  suggesting  that  the  Clock  Generator 
section  is  working  normally  as  well.  This  leaves  twelve  candidates,  the  top 
five  of  which  are  inside  the  Console  Controller  C: 
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Weight  Candidate  Note 


0.165 

0.115 

0.082 

0.082 

0.082 


[U33] 

8035  Microprocessor 

IU18] 

PROM 

[U7] 

Instruction  Address  Latch 

[U9J 

Buffer 

[U8] 

Buffer 

The  behavior  of  the  8035  microprocessor  inside  E  is  described  in  a  tem¬ 
porally  coarse  fashion,  just  as  the  8741  microprocessor  was  in  the  previous 
example.  The  8035  is  either  in  the  “run”  or  “stop”  state,  depending  on  the 
frequency  of  its  clock  input  and  whether  its  reset  input  is  asserted.  While 
running,  it  should  be  repeatedly  asserting  the  signal  PSEN,  which  reads  in¬ 
structions  from  the  PROM.  If  the  PROM  and  some  other  buffers  are  all 
working  properly,  then  the  Read  and  Write  bus  control  signal  should  be  re¬ 
peatedly  asserted  as  well.  The  top  ranked  probes  are  shown  here: 

Node  Signal  Expected  Support 

Read  changing  {U7,U8,U9,U30,U32,U33,RN6} 
nl37  Write  changing  {U7,U8,U9,U30,U32,U33,RN6} 
nil  PSEN  changing  {U30,U32,U33,RN6} 


After  observing  that  none  of  these  three  signals  are  changing,  there  are 
just  four  candidates: 


Weight 

Candidate 

Note 

0.488 

[U33] 

8035  Microprocessor 

0.195 

(U30] 

Frequency  dividers 

0.171 

[U32] 

Frequency  buffers 

0.122 

[RN6] 

Pullups 

Two  subsequent  probes  of  the  clock  inputs  to  the  8035  microprocessor 
U33  show  that  they  are  normal  and  leave  U33  as  the  only  candidate. 

As  before,  the  temporally  coarse  model  of  the  behavior  of  the  micropro¬ 
cessor  and  its  combined  behavior  with  the  PROM  and  other  components 
allowed  a  few  simple  probes  (nine,  in  this  example)  to  find  the  broken  mi¬ 
croprocessor. 
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3.4  Summary  of  Troubleshooting  Scenarios 

The  seven  scenarios  presented  above  provide  context  and  a  set  of  examples 
that  the  next  few  chapters  will  draw  upon.  They  also  illustrate  that  the  trou¬ 
bleshooting  engine  XDE  is  able  to  deal  with  complex  devices  not  due  to  any 
major  innovation  in  the  underlying  model-based  troubleshooting  technology, 
but  rather  due  to  innovations  in  constructing  the  device  model  that  it  uses. 
XDE  works  well  on  the  Console  Controller  Board  because  the  board  can  be 
modeled  with  the  goal  of  troubleshooting  explicitly  in  mind,  and  this  implies 
certain  desirable  features  of  that  model.  Temporally  coarse  descriptions  of 
behavior  are  obviously  important,  but  there  are  others.  The  following  three 
chapters  will  present  in  detail  the  representations  of  circuit  structure,  circuit 
behavior,  and  faults  that  all  together  can  represent  complex  devices  in  a  way 
that  makes  it  feasible  for  XDE  or  any  other  model-based  troubleshooting 
engine  to  troubleshoot  them. 
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Chapter  4 

Representing  Circuit  Structure 


Model-based  troubleshooting  requires  an  explicit  representation  of  the  in¬ 
ternal  structure  of  the  device  being  diagnosed.  All  the  diagnoses  that  the 
troubleshooting  engine  produces  will  be  expressed  in  terms  of  the  components 
that  appear  in  that  structure  representation.  The  need  for  efficiency  indi¬ 
cates  several  desirable  properties  of  this  structure  representation:  it  should 
be  a  strict  hierarchy,  its  leaves  should  correspond  to  the  locations  of  possible 
failures,  and  every  field  replaceable  component  should  correspond  to  some 
node  in  the  hierarchy.  These  properties  are  embodied  in  a  representation  of 
the  physical  structure  of  the  device.  Predicting  the  behavior  of  a  complex 
device  from  the  details  of  its  physical  organization  can  be  greatly  simplified 
by  using  a  representation  of  the  intended  behaviors  of  groups  of  components 
at  multiple  levels  of  abstraction.  For  example,  it  is  easier  to  reason  about 
the  behavior  of  a  digital  logic  gate  than  about  the  equivalent  collection  of 
resistors  and  transistors:  the  structural  composition  of  those  components  en¬ 
ables  abstraction  of  their  combined  behavior.  For  the  same  reason,  it  is  easier 
to  reason  about  an  adder  performing  arithmetic  on  integers  than  about  the 
equivalent  collection  digital  logic  gates,  and  so  on.  A  nonstrict  functional  hi¬ 
erarchy  provides  a  way  of  organizing  these  structural  compositions  to  which 
intended  behaviors  are  attached.  The  nodes  in  the  functional  hierarchy  are 
essentially  “slices”  through  the  physical  structure  [Sussman77]  [Sussman80]. 
They  are  chosen  explicitly  to  facilitate  behavioral  abstraction. 

These  two  views  of  digital  circuit  organization  are  concretely  expressed  in 
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the  circuit  structure  language  BASH1.  BASIL  descends  from  DPL  [Batali81] 
and  TDL  [Davis83]  and  it  inherits  the  idea  of  representing  circuit  structures 
as  graphs  of  objects  with  connections  between  them  at  “ports,”  although 
BASIL  is  implemented  quite  differently.  BASEL  provides  predicates  and  a 
vocabulary  of  primitive  components,  but  more  important  than  BASIL  itself 
are  the  principles  for  composing  these  primitives  into  physical  and  functional 
organizations  in  ways  that  facilitate  troubleshooting.  Two  key  principles  are: 

•  Components  in  the  representation  of  physical  structure  should  corre¬ 
spond  to  the  possible  repairs. 

•  Structural  composition  should  allow  simplification  of  behaviors  and 
facilitate  behavioral  abstraction. 


4.1  Physical  Organization 

A  representation  of  the  internal  physical  organization  of  devices  is  essential  in 
model-based  troubleshooting.  The  physical  world  is  where  the  observations 
that  the  troubleshooting  engine  requests  and  the  repairs  that  it  recommends 
are  located;  the  physical  world  is  also  the  source  of  information  about  the 
plausible  failures. 

4.1.1  Primitive  Components 

To  represent  the  physical  structure  of  a  device  for  troubleshooting,  the  first 
and  central  issue  is  choosing  the  primitive  level  of  detail.  Since  the  complex¬ 
ity  of  the  world  is  to  be  abstracted  away  into  a  graph  of  components  and 
their  connections,  the  essence  of  the  choice  is  in  where  to  draw  those  primitive 
component  boundaries.  Drawing  these  boundaries  makes  three  fundamental 
commitments.  First,  it  makes  some  failures  indistinguishable  to  the  trou¬ 
bleshooting  engine  —  every  failure  inside  a  primitive  component  will  result 
in  the  same  diagnosis.  Second,  it  makes  some  failures  representable  only 
as  failures  in  multiple  components  —  for  example,  the  troubleshooting  en¬ 
gine  would  diagnose  a  short  circuit  between  two  (supposedly)  non-interacting 
components  as  failures  in  both  components.  Third,  the  lower  the  level  of 
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physical  detail  the  more  work  will  be  involved  in  predicting  behavior  —  for 
example,  representing  individual  transistors  on  a  chip  implies  the  possibil¬ 
ity  that  the  behavior  of  each  individual  transistor  will  be  reasoned  about 
explicitly.  Thus  there  is  a  tradeoff  to  be  made  between  the  detail  in  the 
representation  and  the  efficiency  of  reasoning  with  it:  more  detail  makes  di¬ 
agnosis  more  accurate  but  results  in  more  work.  BASIL  or  any  other  structure 
representation  is  a  compromise  between  these  conflicting  goals. 

BASIL  uses  etches ,  pins ,  and  chiplets  (areas  of  silicon  real  estate  inside 
the  chip  package)  as  its  primitive  components.  Figure  4.1  shows  a  cross 
section  of  a  chip  soldered  into  a  board.  The  etches,  pins,  and  chiplet  are 
all  components.  The  principles  at  work  in  choosing  these  as  primitives  are 
discussed  below. 


Figure  4.1:  Chip  Cross  Section 


•  Collect  fault  locations  with  indistinguishable  effects  into  a  single  com¬ 
ponent. 

Electrical  signals  travel  between  the  etch  and  the  silicon  inside  chips 
through  a  solder  joint  at  the  hole,  the  pin  on  the  chip,  and  a  tiny  bond¬ 
ing  wire  that  reaches  from  the  pin  to  a  metal  pad  on  the  silicon.  Opens  and 
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shorts  can  happen  to  the  pin  proper,  the  bonding  wirea,  and  the  aolder;  the 
bonding  wire  ia  especially  susceptible  to  being  shaken  loose  and  becoming  an 
open  circuit.  Under  the  assumption  that  only  the  voltages  and  currents  at 
the  solder  joint  will  be  observable,  these  physical  failures  are  indistinguish¬ 
able.  Thus  they  are  all  treated  as  one  component,  called  a  pin.  The  pin  has 
a  port  at  each  end,  referred  to  here  as  the  tolder  and  the  bond  ports. 

•  Collect  many  individually  unlikely  fault  locations  into  a  single  compo¬ 
nent. 

The  metal  strips  that  run  between  the  holes  in  a  board  are  called  etches. 
They  are  usually  tree-structured  when  connecting  more  than  two  holes. 
Sometimes  branches  of  the  etches  crack  (becoming  open  circuits)  or  get  ac¬ 
cidentally  connected  to  other  etches  (becoming  short  circuits  or  “bridging” 
faults).  Such  failures  are  somewhat  less  likely  than  the  bonding-wire  breaks 
mentioned  above.  BASIL  thus  represents  an  entire  metal  etch  —  no  matter 
how  many  branches  it  has  —  as  a  single  component  with  one  port  at  each 
hole.  There  is  no  distinction  between  cracks  in  different  branches  of  the  etch; 
any  real  break  will  be  diagnosed  as  a  failure  of  the  entire  etch.  There  is 
also  no  representation  of  the  physical  adjacency  of  different  etches  and  no 
way  to  explicitly  represent  bridging  faults;  real  shorts  between  etches  will  be 
misdiagnosed  as  a  pair  of  failures  in  the  two  etches. 

BASIL  could  represent  each  branch  and  junction  of  the  etch  explicitly, 
and  could  represent  the  points  of  possible  bridging  explicitly,  but  this  would 
entail  an  unreasonable  number  of  primitive  components.  It  would  be  ineffi¬ 
cient  since  these  faults  are  not  nearly  as  common  in  the  field  as  others.  An 
alternative  would  be  to  represent  the  possible  points  of  failure  implicitly  by 
representing  the  three-dimensional  layout  of  the  etches;  this  has  not  been 
done  either. 

The  internal  structure  of  chip  packages  provides  another  example  of  this 
principle.  Every  transistor  on  a  silicon  chip  may  produce  a  detectably  dif¬ 
ferent  misbehavior  if  it  fails,  but  any  individual  failure  is  relatively  unlikely. 
Hence  each  independent  functional  unit  on  the  silicon  within  a  chip  is  a  prim¬ 
itive  component,  referred  to  as  a  chiplet.  For  example,  a  74LS04  chip  has  six 
inverters  on  it;  each  of  these  inverters  is  a  separate  chiplet  within  the  chip. 
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4.1.2  BASIL 

BASIL  represent*  the  types  of  component*  and  their  relationships  using  four 
predicates.  The  basic  syntax  is  Cambridge  prefix  predicate  calculus  using 
[...]  to  indicate  predicate  terms  and  (...)  to  indicate  function  terms. 
The  syntax  is  inherited  from  JOSHUA  [Rowley87]. 

The  predicate  ako  forms  the  lattice  of  types,  [ako  ?x  ?y]  means  that 
all  individuals  of  type  ?x  are  of  type  ?y  also.  For  example,  etches  are  a  kind 
of  component:  [ako  stch  component] .  The  predicate  ako*  is  the  Kleene 
star  of  ako. 

Among  the  primitive  types  of  component  are  etch,  chiplat,  pin, 
invartar,  rasistor,  and  switch.  Figure  4.2  shows  a  small  portion  of  the 
type  hierarchy. 


Figure  4.2:  Abbreviated  AKO  hierarchy 

/ 

Component 


The  predicate  isa  denotes  the  most  specific  types  of  an  individual.  For  ex¬ 
ample,  u32a  is  a  physical  realization  of  an  inverter  in  silicon;  hence  it  is  both 
a  chiplat  and  an  invartar.  These  are  denoted  [isa  u32a  chiplat]  and 
[isa  u32a  invartar] .  The  predicate  isa*  denotes  the  relationship  between 
an  individual  and  all  of  the  types  to  which  it  belongs.  Thus  [isa*  ?x  ?z]  ■* 
[isa  ?x  ?y]  A  [ako*  ?y  ?z] .  For  example,  u32a  is  a  component  because 
it  is  an  inverter,  and  inverters  are  components:  [isa*  u32a  component]. 

The  chip  cross-section  in  Figure  4.1  showed  the  following  set  of  isa  rela¬ 
tions: 
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[isa  nl97  etch]  [iaa  (pin  4  u32)  pin]  [isa  u32a  chiplat] 

[isa  nl65  atch]  [isa  (pin  12  u32)  pin]  [isa  u32a  invartar] 

Components  interact  with  other  components  through  ports.  By  conven¬ 
tion  a  port  denoted  (?diraction  ?id  ?component)  is  a  port  of  that  com¬ 
ponent.  The  direction  function  is  one  of  in,  out,  or  bi  indicating  that  it 
is  intended  to  be  an  input,  output,  or  bidirectional  port  respectively.  For 
example,  (in  a  u32a)  is  the  “a”  input  of  inverter  u32a.  (bi  2  nll9)  is 
the  port  where  etch  nll9  electrically  interacts  with  pin  4  of  U32.  The  predi¬ 
cate  has -port  denotes  this  relationship;  for  example,  u32a  has  an  “a”  input: 
[has-port  u32a  (in  a  u32a)] .  The  set  of  assertions  about  ports  shown  in 
the  chip  cross-section  of  Figure  4.1  is: 

[has-port  nl97  (bi  2  nl97)]  [baa-port  nl65  (bi  1  nl65>] 

[baa-port  u32a  (in  a  u32a)]  [baa-port  u32a  (out  y  u32a)] 

Connections  are  a  kind  of  component  that  have  exactly  two  ports.  Each  of 
these  ports  is  shared  with  one  other  component.  The  only  kind  of  connection 
shown  so  far  are  pins,  which  are  named  (pin  Tnumbsr  ?chip) J.  For  exam¬ 
ple,  in  the  chip  cross-section  of  Figure  4.1,  pin  4  of  chip  U32  connects  port 
2  of  etch  nll9  to  input  port  “a”  of  inverter  u32a.  This  is  denoted  with  the 
predicate  conn  as  [conn  (pin  4  u32)  (bi  2  nll9)  (in  a  u32a)].  Note 
that  in  BASIL  the  only  substantive  difference  between  ordinary  components 
and  connections  is  that  the  names  of  the  ports  of  a  connection  refer  to  ad¬ 
jacent  components,  not  the  connection  itself.  The  connections  shown  in  the 
chip  cross-section  of  Figure  4.1  are  the  two  pins: 

[conn  (pin  4  u32)  (bi  2  nl97)  (in  a  u32>] 

[conn  (pin  12  u32)  (bi  1  nl65)  (out  y  u32>] 

BASIL  has  other  predicates  and  a  more  densely  populated  ako  hierarchy 
than  indicated  here.  These  details  will  be  presented  shortly. 

*Most  components  ue  named  by  a  single  atom  sack  as  u32.  Pins  are  the  sole  exception, 
since  names  like  (pin  4  u32)  ue  function  terms.  They  could  just  as  easily  have  been 
named  by  atoms,  for  example  “u32.4.” 


76 


CHAPTER  4.  REPRESENTING  CIRCUIT  STRUCTURE 


4.1.3  The  Physical  Part-Of  Hierarchy 

The  predicates  and  primitive  component  types  in  BASIL  allow  an  entire  cir¬ 
cuit  board  to  be  described  in  terms  of  the  subparts  of  its  chips  and  the 
connectivity  among  them,  but  it  would  be  inefficient  to  troubleshoot  a  large 
circuit  using  only  this  primitive  level  of  detail.  Almost  any  symptom  alone 
would  yield  dozens  of  pins,  etches,  and  chiplets  as  suspects.  A  hierarchic  rep¬ 
resentation  allows  groups  of  primitive  components  to  be  efficiently  treated 
as  a  single  component.  For  example,  it  is  more  efficient  to  diagnose  to  the 
level  of  chips  before  considering  the  internals  of  those  chips,  since  there  are 
far  fewer  chips  to  consider  than  pins  and  chiplets.  The  predicate  ppart-of 
(“physical  part  oP)  denotes  the  relationship  that  forms  the  physical  hierar¬ 
chy;  [ppart-of  u32a  u32]  means  that  u32a  is  a  part  of  u32. 

The  right  physical  components  to  group  together  to  form  the  ppart-of 
hierarchy  are  the  ones  that  correspond  most  directly  to  repair  actions.  The 
main  objective  of  the  troubleshooter  is  to  find  the  repair  or  set  of  repairs 
most  likely  to  make  the  device  work  again.  Since  the  troubleshooting  engine 
computes  diagnoses  that  correspond  to  sets  of  components,  it  would  be  effi¬ 
cient  to  have  a  one-to-one  correspondence  between  the  possible  repair  actions 
and  the  components  in  the  hierarchy.  This  would  make  each  diagnosis  map 
directly  to  a  set  of  repairs  to  be  done,  and  the  troubleshooting  engine  would 
not  waste  effort  distinguishing  between  different  faults  that  had  the  same 
repair.  In  the  circuit  domain  this  is  straightforward,  since  for  the  failures  un¬ 
der  consideration  the  possible  repair  actions  consist  only  of  replacing  boards, 
replacing  chips,  and  re-soldering  broken  etches.  By  making  the  hierarchy  of 
components  a  physical  hierarchy  in  which  chips  and  boards  are  the  only  com¬ 
ponents  other  than  the  primitives,  the  diagnoses  will  be  directly  translatable 
into  possible  repairs.  In  the  digital  circuit  domain  the  resulting  hierarchy  is 
bushy;  one  or  more  chiplets  and  their  pins  together  form  a  chip,  and  chips 
and  etches  together  form  the  board.  Figure  4.3  shows  a  small  portion  of  the 
physical  part-of  hierarchy  of  the  Console  Controller  Board. 

Manufactured  artifacts  can  nearly  always  be  decomposed  into  a  part  hi¬ 
erarchy  that  is  strict,  a  decomposition  that  reflects  the  way  the  artifact  was 
constructed.  Chips  are  fabricated  separately  and  soldered  into  the  board, 
for  example,  and  this  indicates  that  the  chips  and  printed  board  have  no 
shared  parts.  There  are  exceptions  whenever  the  assembly  process  itself 
causes  boundaries  to  be  diffuse.  Parts  may  be  built  up  by  incremental  and 
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Figure  4.3:  A  Portion  of  the  pp*rt-of  Relation 
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overlapping  manufacturing  steps,  as  with  the  layers  of  a  silicon  chip  layout; 
parts  may  merge  smoothly  into  one  another,  as  with  pieces  of  metal  welded 
together.  As  long  as  the  physical  object  can  be  divided  along  boundaries 
there  is  at  least  a  degenerate  strict  hierarchy  to  be  found:  all  of  those  parts 
can  be  immediate  descendants  of  the  overall  structure.  These  exceptional 
and  degenerate  cases  do  not  occur  in  digital  circuit  boards  at  the  level  of 
description  that  BASIL  uses.  Each  pin  and  chiplet  is  part  of  just  one  chip, 
each  bit  of  solder  is  part  of  one  etch,  and  so  on.  The  same  would  be  true 
for  larger  scale  organizations  of  boards,  card  cages,  cabinets,  and  so  on:  the 
way  the  artifact  gets  assembled  from  its  parts  forms  the  physical  part-of  hi¬ 
erarchy.  Even  cables  between  different  cabinets  are  not  an  exception;  they 
customarily  have  their  own  part  numbers  and  are  typically  listed  in  the  parts 
list  of  the  entire  computer.  The  physical  hierarchy  in  BASIL  is  strict  and  that 
accurately  represents  the  real  world. 

The  fact  that  the  physical  hierarchy  is  strict  simplifies  comparing  alter¬ 
native  diagnoses.  It  need  not  be  strict  —  the  troubleshooting  engine  would 
still  compare  diagnoses  and  rank  them  appropriately  —  but  a  strict  hierarchy 
makes  it  more  efficient. 

For  troubleshooting,  each  component  has  a  status  indicating  whether  it 
is  believed  to  be  working  normally,  faulty  in  some  known  way,  or  faulty 
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in  an  unknown  way.  The  predicate  statua-of  denotes  this  relation.  For 
example,  [statua-of  u32  working]  means  the  component  U32  is  working, 
that  is,  it  is  not  physically  damaged.  Because  BASIL  assumes  that  only 
|  components  can  fail,  the  status  of  each  other  component  can  be  deduced 

from  the  status  of  the  components  that  are  part  of  it,  or  that  it  is  part  of. 
In  the  example  above,  the  board  is  working  if  all  its  chips  and  etches  are 
working;  the  chip  U32  is  working  if  all  its  pins  and  the  six  inverters  inside  it 
are  working.  Contrapositively,  if  U32  is  not  working  then  at  least  one  of  its 
|  pins  or  inverters  is  not  working,  and  so  on. 

While  troubleshooting,  each  status  and  each  diagnosis  is  assigned  a  rela¬ 
tive  likelihood  (as  discussed  in  a  later  chapter).  Computing  these  likelihoods 
is  greatly  simplified  if  the  different  component  statuses  can  be  treated  as  sta¬ 
tistically  independent.  One  way  for  that  independence  to  be  violated  would 
be  for  components  to  share  parts,  since  a  single  failure  in  some  shared  part 
would  appear  as  a  failure  in  all  the  sharing  components  (in  fact,  since  the 
probabilities  of  failure  in  the  two  parent  components  would  be  different  from 
their  product,  by  definition  their  probabilities  of  failure  are  not  independent). 
With  a  strict  hierarchy  it  is  trivial  to  determine  whether  parts  are  shared;  a 
i  pair  of  components  can  share  parts  only  if  one  is  an  ancestor  of  the  other. 

I  The  strict  hierarchy  thus  simplifies  computing  relative  likelihoods  since  it 

can  easily  be  arranged  that  no  diagnosis  ever  mentions  failures  in  both  a 
component  and  one  of  its  ancestors. 


4.2  Functional  Organization 

Although  the  physical  organization  discussed  above  is  central  to  the  trou¬ 
bleshooting  task,  the  physical  packaging  of  digital  circuits  often  has  an  almost 
accidental  nature.  Implementing  the  desired  functionality  using  off-the-shelf 
chips  typically  means  sharing  several  functions  in  one  package  (for  example, 
four  gates  on  a  chip)  or  using  only  a  portion  of  the  functions  on  a  chip  (for 
example,  using  a  universal  shift  register  with  all  its  control  inputs  tied  to 
power  or  ground).  For  efficient  reasoning  about  the  behavior  of  a  complex 
device,  it  is  useful  to  be  able  to  consider  the  combined  behavior  of  portions 
of  several  different  physical  components.  Moreover,  this  reasoning  requires 
behavioral  abstractions,  and  some  behavioral  abstractions  do  not  apply  to 
primitive  components.  For  example,  it  is  simpler  to  reason  about  a  digital 
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adder  operating  on  integer*  than  about  several  one-bit  adders  doing  their 
operations  on  bit  vectors.  Structural  composition  of  those  one-bit  adders 
along  with  their  interconnecting  wires  yield  a  composite  component  whose 
behavior  can  be  described  abstractly  in  terms  of  n-bit  integers.  That  the 
one-bit  adders  reside  on  different  physical  components  is  an  accident  of  im¬ 
plementation;  together  with  their  interconnecting  wires  they  still  form  an 
adder  component. 

BASIL  represents  this  knowledge  as  functional  components  augmenting 
the  physical  components  described  earlier.  Functional  components  are  simi¬ 
lar  in  many  ways  to  physical  components;  they  have  ports  and  statuses,  and 
they  are  organized  into  a  hierarchy  by  the  f  part -of  relation.  The  primitive 
components  discussed  earlier  —  etches,  pins,  and  so  on  —  are  both  physical 
and  functional  components;  the  functional  and  physical  hierarchies  thus  meet 
at  their  leaves.  This  yields  the  expanded  ako  hierarchy  shown  in  Figure  4.4. 


Figure  4.4:  Expanded  AKO  Hierarchy 
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4.2.1  The  Functional  Part-Of  Hierarchy 

The  functional  part-of  hierarchy  is  not  strict,  and  has  a  much  richer  vocabu¬ 
lary  of  component  types  than  the  physical.  The  reason  for  this  is  that  there 
are  often  several  alternative  and  incomparable  ways  of  describing  even  the 
same  collection  of  components.  For  example,  one  way  to  describe  the  com¬ 
bination  JK  fiipflop  and  pullup  in  Figure  4.5  is  as  a  “Toggle,”  which  has 
a  one-bit  output;  another  way  would  be  as  a  “two-phase  clock  generator,” 
which  has  a  two-bit  output.  Both  behavior  descriptions  are  legitimate,  but 
neither  subsumes  the  other. 

Figure  4.5:  JK  Fiipflop  Unencapsulated 


The  Toggle  is  as  an  example  of  a  functional  component  that  is  the  com¬ 
position  of  several  primitive  components:  (i)  a  JK  fiipflop  chiplet  (ii)  the 
etch  that  connects  four  of  its  inputs  together,  and  (iii)  the  pins  that  connect 
that  etch  to  the  chiplet.  The  etch,  pins,  and  chiplet  are  all  fpart-of  the 
Toggle.  Figure  4.6  shows  these  subcomponents  (rectilinear  boxes),  the  ports 
at  which  they  interface  (the  black  spots),  and  the  boundaries  of  the  Toggle 
(the  dotted  line).  Both  the  JK  fiipflop  and  Toggle  have  an  explicit  “power” 
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port  that  will  be  explained  later;  to  avoid  clutter  these  are  not  shown.  Even 
with  this  simplification  Figure  4.6  may  be  a  bit  difficult  to  understand;  the 
difficulty  of  conveying  encapsulations  like  this  visually  stems  in  part  from  the 
fact  that  etches  and  pins  are  not  usually  treated  as  explicit  components,  and 
from  the  fact  that  the  desire  to  keep  the  boundary  convex  requires  requires 
distortions  of  the  normal  two-dimensional  layout. 


Figure  4.6:  JK  Flipflop  Encapsulated  as  a  Toggle 


The  Toggle  has  ports  just  as  the  JK  flipflop  did.  The  relationship  be¬ 
tween  the  ports  of  an  abstract  component  and  the  ports  of  its  underlying 
components  is  represented  with  the  predicate  corr  (“correspondence”): 

•  [corr  correspondence  abstract-port  .  concrete-ports]  means 
that  there  is  a  group  of  one  or  more  concrete-ports  that  correspond 
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to  one  abstract -port.  The  nature  of  that  correspondence  is  denoted 
by  the  corrsspondsncs  argument.  The  most  common  correspondence 
is  identity,  which  means  that  the  two  ports  are  equivalent.  Other  cor¬ 
respondences  include  concat  (the  concatenation  of  bits  into  integers), 
ttl-powwr  (a  high  voltage  port  and  a  ground  port  that  correspond  to 
l  single  "power”  port),  and  tvo-phasa- clock  (a  pair  of  one-bit  ports 
at  which  the  voltages  are  180  degrees  out  of  phase). 

In  the  case  of  the  Toggle,  each  of  its  three  ports  stand  in  an  identity 
correspondence  with  one  underlying  port  apiece.  These  ports  are  indicated 
in  Figure  4.6  where  the  dotted  line  passes  through  ports. 

The  ponwr  port  that  appears  on  nearly  all  chips  introduces  some  con. 
plications,  since  the  power  supplied  to  all  the  chiplets  on  a  typical  chip  is 
supplied  through  a  pair  of  pins,  “pwr”  (high  voltage)  and  “gnd.”  Figure  4.7 
shows  the  ttl -power  correspondence  between  the  ports  at  these  pins  and  a 
single  power  port  shared  by  the  whole  chip.  The  power  port  for  the  whole 
chip  then  stands  in  an  identity  correspondence  with  each  of  the  power  ports 
of  the  chiplet  components  on  the  chip.  U30,  for  example,  is  a  dual  JK  flipflop 
chip,  and  its  two  flipflop  chiplets  are  named  U30a  and  U30b;  they  each  have 
a  power  port  along  with  their  other  ports  (dfc,  for  example).  The  advantage 
of  the  power  port  is  that  it  somewhat  simplifies  the  behavior  descriptions  of 
the  individual  components. 

4.2.2  Principles  for  Structural  Composition 

Successive  layers  of  possibly  overlapping  compositions  can  create  a  deep 
fpaxt-of  lattice  representing  many  behavioral  groupings  in  the  device.  Yet 
it  is  one  thing  to  be  able  to  explicitly  represent  the  hierarchic  functional  orga¬ 
nization  of  a  complex  digital  circuit,  it  is  quite  another  to  discover  the  right 
components  to  compose  together  and  the  right  behavioral  abstractions  to  use. 
Starting  only  from  a  digital  circuit  schematic  and  behaviorally  detailed  de¬ 
scriptions  of  the  physical  component  behaviors,  someone  or  something  must 
construct  that  richer  representation.  Currently  it  is  constructed  by  hand, 
but  importantly,  not  in  an  ad  hoc  fashion.  There  is  a  fundamental  principle 
at  work: 

•  Structural  composition  should  enable  behavioral  simplification. 
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Figure  4.7:  Power  Ports  of  Chip  U30  and  its  Chiplets 


(in  CLK  u30b) 
(in  POWER  u30b) 


IDEHTITY 
c  orr • spondanc • 


(in  PWR  u30) 


(in  CLK  u30a) 

(in  POWER  u30a) 


(in  POWER  u30) 

TTL-POWER  correspondence 

(in  GND  u30) 


I 


That  is,  the  grouping  of  connected  or  related  components  together  — 
structural  ct  ^position  —  is  distinct  from  behavioral  abstraction,  but  from  a 
troubleshooting  perspective,  the  only  motivation  for  structural  composition 
is  to  simplify  behavior.  For  example,  there  is  no  point  in  composing  four 
one-bit  adder  slices  together  and  calling  it  an  “adder”  unless  the  behavior 
associated  with  the  adder  takes  advantage  of  the  abstraction  that  maps  from 
vectors  of  bits  to  integers.  The  Toggle  is  a  worthwhile  functional  component 
because  its  behavior  is  much  simpler  than  the  whole  JK  flipflop.  In  digital 
circuits,  there  are  three  ways  the  general  principle  manifests  itself  and  hence 
three  reasons  to  introduce  structural  compositions: 


1. 


I 


To  suppress  constant  signals.  For  example,  if  a  node  is  pulled  up  and 
always  supplies  a  “high”  value  to  some  component,  the  pullup  and 
component  can  be  grouped  together  to  form  a  simpler  component. 


2.  To  encapsulate  reconvergent  signals.  Reconvergent  signals  are  signals 
that  originate  from  a  common  source,  and  then  are  recombined  to  pro¬ 
duce  some  other  signal.  Such  structures  can  cause  difficulties  for  pro¬ 
grams  that  reason  about  circuit  behavior  through  local  propagations. 
A  simple  example  is  shown  in  Figure  4.8.  In  the  unencapsulated  ver¬ 
sion,  purely  local  propagation  cannot  deduce  from  A=1  and  C=1  that 


» 
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F  must  be  0.  Encapsulating  the  fanout  of  B  and  its  reconvergence 
alleviates  the  problem. 


Figure  4.8:  Encapsulating  Reconvergence 


3.  To  encapsulate  loops.  Digital  circuits  often  perform  computations  se¬ 
quentially  with  a  loop  of  combinational  circuitry  and  registers  that  store 
intermediate  results.  The  encapsulation  of  combinational  circuitry  and 
registers  may  have  a  combined  behavior  that  is  simpler  to  reason  about 
than  that  for  all  the  individual  components.  In  a  sense  this  is  a  special 
case  of  encapsulating  reconvergence.  Figure  4.9  shows  a  simple  exam¬ 
ple;  the  combined  D-flipflop  and  XOR-gate  form  a  parity  generator 
for  a  serially  encoded  input.  In  concert  with  the  appropriate  behav¬ 
ioral  abstraction  it  is  not  necessary  to  reason  about  the  clock-by-clock 
operation  of  the  combined  structure. 


Figure  4.9:  Encapsulating  a  Sequential  Loop 


Every  non-primitive  functional  component  that  appears  in  the  Console 
Controller  Board  description  is  motivated  by,  and  ap  example  of,  one  of  these 
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three  principles.  The  interesting  end  difficult  part  of  the  story  concerns  the 
detection  and  formulation  of  the  appropriate  abstract  behaviors  to  go  along 
with  the  structural  compositions;  that  is  treated  in  the  next  chapter. 


Chapter  5 

Representing  Circuit  Behavior 


A  central  requirement  of  the  model-based  troubleshooting  methodology  is 
that  the  program  be  able  to  make  predictions  about  behavior  based  on  ob¬ 
servations  of  the  inputs  and  outputs  of  a  device  and  its  subcomponents. 
Making  predictions  requires  both  representation  of  behavior  and  computa¬ 
tional  machinery  to  determine  that,  for  example,  uif  A  is  an  adder  and  its 
inputs  are  2  and  2,  its  output  is  4.”  In  practice,  a  second  requirement  is 
that  the  program  be  able  to  make  those  predictions  using  a  variety  of  differ¬ 
ent  domain-specific  abstractions,  making  compromises  between  the  precision 
and  efficiency  of  predictions  made  with  different  vocabularies.  In  the  case  of 
adder  A,  there  might  be  a  good  reason  to  represent  the  inputs  either  more 
abstractly  as  simply  “even”  or  “odd"  or  more  concretely  as  bit  vectors.  Since 
troubleshooting  real  digital  circuits  means  reasoning  about  the  behavior  of 
components  horn  resistors  to  microprocessors,  the  representation  must  be 
flexible  enough  to  integrate  many  levels  of  abstraction. 

This  chapter  is  partly  about  TINT,  a  language  of  predicates  and  rules  that 
builds  on  BASIL  by  propagating  temporal  constraints  through  a  network  of 
instantiated  components.  TINT  is  a  framework  that  can  be  used  to  describe 
the  behavior  of  components  at  several  levels  of  detail.  What  is  important, 
however,  is  not  just  the  framework  itself,  but  the  rich  variety  of  abstractions 
and  component  behaviors  that  will  populate  it.  Hence  this  chapter  is  also 
about  the  abstractions  that  make  it  possible  to  represent  the  behavior  of 
complex  circuits  for  troubleshooting. 

The  primitive  level  of  abstraction  is  a  switch  level  model  that  uses  volt¬ 
ages  in  the  set  {0,1}  and  currents  in  the  set  {-,0,4}.  The  switch  level 
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model  is  discussed  in  Appendix  E;  for  the  most  part  the  reader  may  assume 
that  the  primitive  level  of  detail  is  the  standard  digital  model  using  voltages 
in  the  set  {0,1}.  Some  traditional  abstractions  appropriate  to  representing 
and  troubleshooting  complex  digital  circuits  are  those  that  concern  the  ma¬ 
nipulation  of  groups  of  bits  —  spatial  abstractions  that  make  it  possible  to 
describe  (for  example)  the  signal  being  carried  by  an  8-bit  bus  as  a  number 
or  an  ASCII  character  instead  of  a  bit  vector. 

Yet,  there  are  much  more  powerful  abstractions;  motivating  them  re¬ 
quires  defining  some  terminology.  The  purpose  of  a  behavior  model  in  trou¬ 
bleshooting  is  to  make  predictions  based  on  observations  of  the  device.  The 
predictions  produced  by  a  given  model  can  be  characterised  as  to  the  fidelity 
with  which  they  match  the  real  world,  their  precision,  and  the  efficiency  with 
which  they  can  be  made. 

Fidelity  is  best  illustrated  by  a  counterexample:  suppose  a  digital 
mod  16  adder  were  to  be  represented  as  if  it  did  ordinary  integer  addition. 
Presenting  the  real  adder  with  inputs  of  -8  and  -8  correctly  produces  a 
0.  The  model,  however,  predicts  that  it  should  produce  -16.  This  violates 
fidelity,  and  the  behavior  of  the  adder  in  this  case  would  be  improperly  re¬ 
garded  as  a  symptom  of  failure.  In  model  based  troubleshooting,  fidelity  is 
an  overriding  goal,  since  it  is  better  to  make  an  imprecise  prediction  than  to 
make  a  wrong  one. 

Precision  and  loss  of  precision  in  the  predictions  made  with  a  behavior 
model  are  intimately  tied  to  the  level  of  abstraction  in  the  model.  For  ex¬ 
ample,  modeling  the  mod  16  adder  in  terms  of  the  voltages  on  its  input  and 
output  wires  would  be  more  precise  than  modeling  it  using  its  mod  16  defi¬ 
nition.  Modeling  it  in  terms  of  “negative”  and  “nonnegative”  numbers  would 
sacrifice  precision  in  two  ways.  One  of  these  ways  is  the  loss  of  precision  in 
the  numbers  themselves.  A  second  loss  of  precision  occurs  because  the  be¬ 
havior  of  the  real  adder  is  a  total  function,  but  the  behavior  with  respect 
to  “negative”  and  “nonnegative”  is  partial,  since  “negative  +  nonnegative” 
yields  an  ambiguous  result.  This  latter  special  type  of  precision  loss  will  be 
referred  to  as  a  loss  of  strength ,  that  is,  weakness  in  the  behavior  model. 

The  goals  of  precision  and  efficiency  can  be  traded  off  against  one  another: 
if  efficiency  were  of  no  concern,  the  predictions  could  always  be  very  precise; 
conversely,  the  less  precise  the  predictions  are  the  cheaper  it  is  in  general  they 
are  to  make.  The  problem  of  modeling  behavior  for  a  given  class  of  devices 
requires  choosing  vocabularies  and  behavior  descriptions  that  retain  enough 
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precision  that  real  symptoms  will  be  detectable,  yet  make  efficient  prediction 
possible.  The  consequence  of  imprecision  is  diagnostic  indiscriminacy.  Thus 
the  issue  is,  what  abstractions  will  sacrifice  the  least  precision  for  the  most 
efficiency? 

Against  this  background  of  fidelity,  precision,  strength,  and  efficiency 
issues,  troubleshooting  complex  digital  circuits  motivates  abstractions  that 
sacrifice  temporal  precision.  Among  the  salient  characteristics  of  the  domain 
are  (i)  the  gap  of  several  orders  of  magnitude  between  the  temporal  granu¬ 
larity  at  which  events  occur  in  the  machine  and  the  temporal  granularity  at 
which  observations  can  be  made,  and  (ii)  the  fact  that  physical  failures  in 
digital  circuits  are  frequently  manifest  at  coarse  timescales.  These  charac¬ 
teristics  mean  that  temporal  precision  can  often  be  sacrificed  without  losing 
the  strength  needed  to  detect  symptoms.  Efficiency  is  gained  through  tem¬ 
poral  abstractions  that  make  it  possible  to  reason  about  large  numbers  of 
events  occurring  in  the  circuit  without  having  to  refer  explicitly  to  each  one. 
The  vocabulary  of  temporal  abstractions  includes  familiar  concepts  such  as 
change ,  sample,  duration,  sequence,  count,  cycle,  and  frequency. 

The  advantage  of  temporal  abstractions  is  that  when  applied  to  compo¬ 
nents  and  groups  of  components  with  complex  behaviors  —  even  micropro¬ 
cessors  —  the  resulting  temporally  abstract  behaviors  can  be  exceedingly 
simple.  The  basic  idea  is  that  a  given  behavior  can  be  usefully  abstracted  if 
changes  on  its  inputs  always  result  in  changes  on  its  output.  Every  change  of 
value  on  the  input  of  an  inverter,  for  example,  results  in  a  change  of  value  on 
its  output.  Even  if  that  property  does  not  hold,  there  are  still  several  generic 
principles  for  forming  useful  partial  descriptions.  For  example,  a  temporally 
abstract  behavior  for  adders  might  relate  the  number  of  changes  on  its  in¬ 
puts  to  the  number  of  changes  on  its  outputs,  but  there  is  no  interesting 
relationship  for  the  addition  behavior  as  such:  both  inputs  could  change  si¬ 
multaneously  in  such  a  way  that  they  cancel  each  other  out.  One  of  the 
generic  principles  for  forming  useful  partial  descriptions  is  "holding  an  input 
constant,”  and  in  this  case,  if  one  of  the  inputs  of  an  adder  is  held  constant 
all  changes  on  the  other  input  do  propagate  through.  Temporally  abstract 
behavior  descriptions  will  be  given  for  a  number  of  components  including 
gates,  counters,  and  microprocessors. 

Although  the  main  purpose  of  this  chapter  is  to  present  the  details  of 
defining  and  reasoning  with  behaviors  and  temporal  abstractions,  the  under¬ 
lying  "modeling  for  troubleshooting”  theme  recurs  several  times: 
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•  Many  of  the  temporal  abstractions  to  be  defined  are  motivated  by  the 
desire  to  explicitly  represent  easily-observed  features  of  signals. 

•  Individual  behavior  definitions  are  judged  for  usefulness  on  the  basis  of 
simplicity  and  therefore  the  tractability  of  the  prediction  problem  in  a 
real  troubleshooting  session. 

•  Many  of  the  rules  that  get  included  in  the  model  are  judged  worthwhile 
because  they  mention  observable  signals  or  can  make  predictions  over 
long  stretches  of  time. 

•  TINT  itself  is  deliberately  limited  in  its  expressive  power,  and  handles 
the  “frame  problem”  in  a  simplistic  way  —  two  engineering  decisions 
taken  because  they  keep  the  troubleshooting  engine  simple. 

With  troubleshooting  as  the  ultimate  goal,  this  chapter  considers  in  turn 
the  language  TINT,  the  representation  of  combinational  and  sequential  be¬ 
haviors,  the  explicit  representation  of  temporal  abstractions,  and  techniques 
for  constructing  temporally  abstract  behavior  descriptions  for  complex  cir¬ 
cuits. 
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5.1  TINT 

The  behavior  of  circuit  components  is  represented  using  TINT1,  a  simple 
temporal  reasoning  system  in  which  rules  are  used  to  derive  facts  about  the 
values  of  functions  of  time.  A  function  of  time  is  called  a  signal;  for  example, 
the  voltage  at  a  circuit  node  is  a  signal  because  its  value  can  change  over 
time.  In  contrast  to  more  sophisticated  models  of  time  (for  example,  the 
interval  model  in  [Allen84]),  for  simplicity  time  is  taken  to  be  a  sparse  set, 
the  integers  divisible  by  a  temporal  granularity  constant  S.  Granularity  can 
be  thought  of  as  the  smallest  unit  of  time  that  is  measurable  by  available 
instruments.  For  the  most  part  the  rules  and  other  definitions  that  follow 
would  remain  unchanged  for  the  limit  as  6  goes  to  0  if  time  were  taken  to  be 
dense.  TINT  provides  two  predicates  thru  and  tsans  for  making  assertions 
about  signal  values: 

1.  [thru  ?1  ?u  ?signal  ? value]  means  that  from  the  lower  bound 
time  ?1  to  the  upper  bound  time  ?u  inclusive,  ?signal  had  value 
?  value. 

2.  [tssune  ?1  ?u  ?signall  ? signal 2]  means  that  at  every  time  be¬ 
tween  the  lower  bound  ?1  and  the  upper  bound  ?u  inclusive,  ?signall 
has  the  same  value  as  ?signal2. 

Any  token  can  appear  as  the  ?valu«  of  a  signal. 

Only  integers,  -oo,  and  +oo  can  appear  as  time  arguments  to  the  thru 
and  tsame  predicates.  This  use  of  timestamps  in  TINT  rather  than  sym¬ 
bolic  quantities  or  expressions  results  in  serious  limitations  as  compared  to 
other  temporal  reasoning  systems,  but  it  is  adequate  for  demonstrating  trou¬ 
bleshooting. 

5.1.1  Signals 

The  ? signal  arguments  of  the  thru  and  tsans  predicates  are  function  terms. 
For  example,  the  term  (voltage  (in  a  u32a)  )  denotes  the  voltage  signal  at 
port  (in  a  u32a).  The  voltage  function  maps  a  port  to  a  real- valued  signal. 
Functions  from  signals  to  signals  will  be  used  to  define  abstractions  and 

1Timestamped  IN  Ter  ml*. 
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behavior ».  Abstractions  describe  relationships  between  signals  at  different 
levels  of  detail.  Behaviors  describe  the  relationships  that  components  enforce 
between  their  input  and  output  signals. 

Signals,  abstractions,  and  behaviors  are  denoted  for  concreteness  as  pro¬ 
cedures  in  a  side  effect  free  LISP  dialect  similar  to  SCHEME  [Abelson85],  as 
in  [Weise86].  These  procedures  are  not  executed  by  an  interpreter;  their  sole 
purpose  is  “mental  hygiene”:  before  writing  rules  to  make  inferences  about 
the  values  of  signals  at  various  levels  of  abstraction  it  is  important  to  know 
what  the  signals  mean.  Almost  any  other  language  could  have  been  used,  but 
the  essential  concepts  all  concern  functions,  and  SCHEME  (and  the  underly¬ 
ing  lambda  calculus)  is  a  powerful  and  familiar  representation  for  functions. 
Only  a  few  such  definitions  are  shown  in  this  section;  the  remainder  are  in 
Appendix  B.  All  obey  the  following  conventions: 

e  The  ■■  symbol  indicates  definitional  equivalence  and  “. . .”  indicates 
elision;  for  example,  x  ■*  (lambda  (y)  ...)  indicates  that  x  is  a 
function  of  one  argument  whose  body  is  not  shown. 

•  Capitalized  symbols  denote  function  arguments  and  lowercase  symbols 
denote  all  others. 

For  example,  a  function  like  voltage  is  primitive  and  can  be  defined  as 
shown  below.  It  maps  a  port  into  a  function  from  time  to  real  numbers: 

voltage  ■■ 

(lambda  (port) 

(lambda  (time)  ...)) 

The  abstraction  voltage- to-logic-level  expects  a  function  of  time 
whose  range  is  the  real  numbers  and  returns  yet  another  whose  range  is 
(0,  1}: 

voltage-to- logic-level  ■■ 

(lambda  (V) 

(lambda  (time) 

(if  (<  (V  time)  1.5)  0  1))) 

The  function  11  takes  a  circuit  node  and  yields  a  function  of  time  whose 
range  is  {0, 1}: 


» 
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11  — 

(lambda  (port) 

(voltage-to-logic-level 
(voltage  port))) 

TINT  does  not  use  these  laabda  definitions  directly,  but  rather  reasons 
with  predicate  ground  terms  containing  composite  terms  built  up  from  prim¬ 
itive  signals  and  abstractions,  [thru  -oo  +oo  (11  (in  a  u32a))  1],  for 
example,  means  that  the  logic-level  at  port  (in  a  u32a)  was  always  1. 
[thru  -oo  4-00  (change  S)  nil]  means  that  the  value  of  some  signal  S 
never  changed,  or,  literally,  that  the  signal  resulting  from  the  application  of 
the  change  abstraction  to  signal  S  was  always  nil. 

5.1.2  Rules 

TINT  provides  rules  that  are  used  in  data-driven  (forward  chaining)  fashion 
to  propagate  the  consequences  of  observations  of  signals.  The  following  is  a 
rule  as  the  program  would  sec  it.  It  says  that  if  x  is  a  thing,  and  the  value 
of  any  signal  s  is  known  over  an  interval  of  positive  duration,  then  the  signal 
obtained  by  applying  abstraction  to  s  is  the  fun  of  its  value: 

(defmyrule  nonsense-rule  (: forward) 

:p  [isa  =x  thing] 

:  s  [thru  =1  =u  =s  =v] 

:f  (<  =1  =u) 

:1  (tell  '[thru  ,=1  ,=u  [abstraction  =s]  .(fun  =v)])) 

The  rules  use  an  extension  of  JOSHUA  syntax.  The  =  prefix  marks  uni¬ 
versally  quantified  variables;  :p  marks  trigger  patterns  whose  matching  pred¬ 
icate  terms  will  not  appear  in  any  resulting  truth  maintenance  system  (TMS) 
clauses;  :s  marks  the  predicate  terms  that  do  appear  in  clauses;  :f  marks 
LISP  filters  that  must  return  non-nil  for  the  rule  to  fire;  :1  marks  the 
LISP  body  of  the  rule;  '  starts  a  quoted  structure  template  and  ,  indicates 
evaluation  of  a  form  within  that  template,  as  in  Common  LISP  ([Steele84] 
pp.  349-351).  For  implementation  reasons,  there  is  no  distinction  between 
function  and  predicate  terms;  they  are  both  denoted  with  [  ]  syntax. 

For  presentation  purposes,  however,  the  above  rule  would  be  formatted  as 
follows,  using  ?  to  indicate  variables,  omitting  details  of  truth  maintenance 
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and  backquoting,  and  retaining  for  clarity  the  distinction  between  predicate 
and  function  terms: 

If  [isa  ?x  thing] 
and  [thru  ?1  ?u  ?■  ?v] 
and  (<  ?1  ?u) 

Then  [thru  ?1  ?u  (abstraction  ?s)  (fun  ?v)] 

5.1.3  Signal  Histories 

The  set  of  all  thru  predications  (predicate  ground  terms)  referring  to  the 
same  signal  is  called  the  history  of  the  signal.  TINT  maintains  the  following 
invariants  for  every  pair  of  predications  in  a  given  signal  history: 

•  Conciseness:  overlapping  intervals  of  the  same  history  are  combined 
into  maximal  intervals.  If  [thru  ?11  ?ul  ?■  ?v]  and  [thru  ?12  ?u2 
?s  ?v]  are  both  true,  and  the  two  intervals  touch  or  overlap  —  that 
is,  (+  6  (max  ?11  ?12))  is  less  than  or  equal  to  (min  ?ul  ?u2)  — 
then  [thru  (min  ?11  ?12)  (max  ?ul  ?u2)  ?»  ?v]  is  also  true.  The 
latter  predication  denotes  a  maximal  interval.  As  long  as  it  remains 
true,  it  shadows  both  predications  [thru  ?11  ?ul  ?■  ?v]  and  [thru 
?12  ?u2  ?•  ?v] ,  and  any  other  predications  it  subsumes.  Rules  never 
fire  on  shadowed  terms. 

•  Consistency:  signals  cannot  have  more  than  one  value  at  any  given 
time,  [thru  ?11  ?ul  ?•  ?vl]  and  [thru  ?12  ?u2  ?■  ?v2]  can¬ 
not  both  be  true  unless  either  their  values  are  the  same  (that  is, 
(•quad.  ?vl  ?v2))  or  the  intervals  are  disjoint  (that  is,  (<  ?ul  ?12) 
or  (<  ?u2  ?li)).  Otherwise  TINT  records  a  conflict. 

TINT  takes  advantage  of  the  fact  that  the  lower  bound  argument  ?1  in 
thru  predications  is  restricted  to  a  totally  ordered  set  to  organize  each  signal 
history  as  a  list  ordered  by  lower  bound.  This  makes  the  above  invariants 
relatively  easy  to  check  and  enforce. 

A  truth  maintenance  system  is  used  to  maintain  boolean  constraints 
among  thru  (and  other)  predications.  Ordinary  implication  is  encoded  as 
a  clause;  for  example,  if  X  and  Y  together  imply  Z  then  there  is  a  clause 
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->X  V  ~>Y  V  Z.  A  “shadowed”  assertion  is  one  that  is  implied  by  other  (pre¬ 
sumably  more  general)  assertions  and  that  should  not  trigger  rule  firings  as 
long  as  it  is  implied  by  those  other  assertions.  Shadowing  is  implemented 
|  as  an  extension  to  the  TMS.  A  clause  may  have  any  of  its  literals  marked 

to  be  shadowed  when  they  are  the  only  satisfiable  literal  in  the  clause.  For 
example,  suppose  A  is  more  general  than  B.  Let  ->A  V  B  be  a  clause  with 
B  marked  to  be  'hadowed.  If  A  is  true,  then  B  is  the  only  satisfiable  literal, 
hence  B  is  true  and  shadowed.  If  A  were  marked  to  be  shadowed  as  well, 

I  then  if  B  is  false,  A  is  the  only  satisfiable  literal,  so  A  is  false  and  shadowed. 

Rules  do  not  fire  on  predications  while  they  are  shadowed. 

Figure  5.1  shows  a  simple  example  of  how  the  conciseness  and  consistency 
invariants  are  maintained  in  the  history  of  a  signal  S.  Each  rectangle  indicates 
a  predication  and  is  positioned  along  the  timeline  according  to  the  interval 
that  it  refers  to  (each  discrete  time  point  is  drawn  as  an  interval  on  the 
real  line).  Clauses  are  indicated  by  numbered  circles;  +  indicates  that  the 
attached  literal  occurs  positively,  -  that  it  occurs  negatively,  and  ()  indicates 
shadowing.  The  network  is  constructed  by  the  following  series  of  operations: 

1.  Some  outside  client  (the  troubleshooting  engine,  for  example)  asserts 
both  [thru  1  9  S  ail]  and  [thru  2  9  S  t].  This  violates  consis¬ 
tency  and  causes  a  conflict,  represented  by  clause  1.  The  client  retracts 
[thru  1  9  S  nil] ,  which  the  TMS  then  makes  false. 

2.  The  client  asserts  [thru  10  19  S  t] ,  and  since  it  overlaps  with  [thru 

2  9  S  t]  (also  true),  TINT  creates  the  new  predication  [thru  2  19 
S  t]  and  installs  clause  2.  Now  [thru  2  19  S  t]  subsumes  the  two 
predications  it  depends  on,  so  TINT  shadows  them  by  installing  clauses 

3  and  4. 

3.  The  client  asserts  [thru  6  17  S  t] ,  but  it  is  immediately  shadowed 
because  [thru  2  19  S  t]  subsumes  it  (clause  5). 

Subsequent  retractions  and  changes  of  truth  value  may  trigger  the  cre¬ 
ation  of  new  maximal  interval  predications  and  new  clauses.  For  example, 
if  the  client  were  now  to  retract  [thru  10  19  S  t] ,  the  TMS  would  make 
[thru  2  19  S  t]  go  out,  unshadowing  [thru  2  9  S  t]  and  [thru  12  17 
S  t] .  Since  the  latter  two  overlap,  TINT  would  then  create  a  new  predication 
[thru  2  17  S  t]  to  be  created  (not  shown). 
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Figure  5.1:  TINT  Signal  History  Example 


5.1.4  Equality 

The  behavior  of  simple  components  such  as  wires,  buffers,  and  switches  is 
often  easily  expressed  as  a  temporally  quantified  equality;  for  example,  if  a 
switch  is  closed  during  the  interval  from  ?1  to  ?u  then  during  that  time  the 
logic-levels  at  its  two  terminals  will  be  equal.  Also,  it  is  sometimes  convenient 
to  give  different  names  to  the  same  signal,  so  equality  between  signals  is  a 
useful  notion  as  well.  The  four- place  predicate  tsame  captures  these  concepts; 
[tsame  ?1  ?u  ?sl  ?s2]  means  that  the  signals  ?sl  and  ?s2  had  the  same 
value  at  every  time  from  ?1  to  ?u  inclusive.  In  the  case  of  different  names 
for  the  same  signal  ?1  and  ?u  are  -oo  and  -}-oo  respectively. 

There  are  no  rules  with  tsame  as  a  trigger  pattern,  but  TINT  does  have  a 
demon  facility  that  is  used  to  compute  the  transitive  closure  of  the  congruence 
relation  with  respect  to  tsame  assertions  and  unshadowed  thru  predications. 
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For  example,  if  a  is  equal  to  b  over  the  interval  5  to  10,  then  knowing  the 
value  of  b  over  any  subinterval  is  propagated  to  an  interval  of  a: 

[tsana  6  15  a  b] 

[thru  0  10  b  t] 

[thru  S  10  a  t] 

The  consequences  of  equality  of  signals  a  and  b  are  also  propagated  to 
their  abstractions;  hence  if  a  value  had  been  known  for  (g  (f  a)  )  it  would 
get  propagated  to  (g  (t  b)  )  as  well: 

[taaaa  5  16  a  b] 

[thru  0  10  (g  (f  a))  nil] 

[thru  S  10  (g  (f  b))  nil] 

This  is  a  brute  force  technique  in  at  least  two  respects.  When  two  names 
refer  to  the  same  signal,  it  would  be  better  to  maintain  a  single  canonical 
name  for  each  signal,  or  in  fact  for  each  function  and  predicate  term,  as  is 
done  for  example  in  [McAllester80a].  In  addition  to  this  redundancy  of  facts 
the  scheme  used  in  TINT  also  results  in  redundancy  of  derivations,  since 
the  same  fact  may  be  derivable  in  different  ways  simply  by  using  equalities 
and  other  rules  in  different  orders.  It  would  be  better  to  control  the  invoca¬ 
tion  of  rules  so  that  fewer  redundant  derivations  are  created,  as  is  done  in 
BREAD  [Feldman88].  The  brute  force  technique  used  in  TINT  is  only  toler¬ 
able  because  the  language  is  restricted  to  equalities  between  signals,  and  the 
consequences  are  propagated  only  for  thru  predications.  If  arbitrary  terms 
could  be  equated,  the  number  of  variant  terms  would  quickly  explode. 

5.1.5  Summary 

TINT  provides  predicates,  rules,  and  a  framework  of  signals  and  abstractions 
that  together  are  used  to  describe  circuit  behavior.  The  preceding  treatment 
of  TINT  is  brief  because  the  language  itself  is  not  particularly  important.  The 
main  concern  is  the  vocabulary  of  signal  types  and  abstractions  and  the  spe¬ 
cific  rules  that  the  program  will  use  to  reason  about  them.  The  next  three 
sections  will  discuss  in  detail  (i)  the  description  and  use  of  combinational 
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(time-independent)  behavior*,  (ii)  the  description  and  use  of  sequential  be¬ 
haviors,  and  (iii)  abstractions  as  embodied  in  TINT  along  with  a  particular 
vocabulary  of  temporal  abstractions. 


98 


CHAPTER  5.  REPRESENTING  CIRCUIT  BEHAVIOR 


5.2  Combinational  Behaviors 

BASIL  components  have  intended  behaviors  that  are  functions  from  signals 
to  signals,  and  these  behaviors  can  be  translated  into  rules.  For  example, 
the  intended  behavior  of  a  digital  inverter  is  tinvart: 

t  invert  *•  (lambda  (S)  (lambda  (time)  (invert  (S  time)))) 

invert  *»  (lambda  (x)  (if  (■  1  x)  0  1) ) 

This  definition  can  be  translated  into  a  rule  that  asserts  facts  about  the 
output  signal  of  the  inverter  based  on  facts  about  its  input  signals. 

The  intended  behavior  of  a  component  depends  on  some  collection  of 
background  conditions  —  for  example,  that  the  component  in  question  is 
“working”  (not  physically  damaged),  that  it  is  connected  to  a  power  source, 
and  so  forth.  The  conditions  currently  included  are  those  about  signals 
that  travel  over  wires  and  that  are  expected  to  be  stable  over  long  periods 
of  time.  The  condition  that  there  be  a  5  volt  drop  from  power  to  ground 
is  an  example,  the  condition  that  a  clock  of  a  certain  constant  frequency  be 
provided  is  another.  Conditions  relating  to  other  features,  such  as  component 
temperature,  magnetic  fields,  alpha  radiation,  and  so  forth  are  not  included 
in  the  model.  Failures  arising  from  those  sources  will  be  misdiagnosed. 

These  background  conditions  must  somehow  be  incorporated  into  the 
rules.  By  convention,  the  background  conditions  for  a  component  are  col¬ 
lected  and  summarized  as  a  mods  signal  whose  value  is  normal  during  the 
intervals  that  all  the  conditions  are  satisfied. 

For  example,  the  following  rule  says  that  if  an  adder  ?a  is  believed  to  be 
working,  then  its  mode  is  normal  as  long  as  it  is  getting  power  (the  isa  and 
status-of  predicates  were  defined  and  discussed  in  Chapter  4): 

If  [isa  ?a  adder] 
and  [statua-of  ?a  working] 
and  [thru  ?1  ?u  (power  (in  power  ?a))  t] 

Then  [thru  ?1  ?u  (mode  ?a)  normal] 

The  principal  behavior  rule  for  adders  thus  depends  on  the  mode  signal 
having  the  value  normal.  In  the  following  rule  the  signals  (num  . . . )  de¬ 
note  the  signals  appearing  at  the  adder  ports  (in  0  ?a),  (in  1  ?a),  and 
(out  0  ?a): 
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If  [isa  Ta  adder] 

and  [thru  Til  Tui  (soda  Ta)  normal] 

and  [thru  T12  Tu2  (rum  (in  0  Ta))  Tvl] 

and2  (overlap  (Til  Tul)  (T12  Tu2)) 

and  [thru  T13  Tu3  (num  (in  1  Ta))  Tv2] 

and  (overlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 

Then  [thru  (max  Til  T12  T13)  (min  Tul  Tu2  Tu3) 

(nun  (out  0  Ta))  (+  Tvl  Tv2)] 

overlap  teets  whether  the  mentioned  intervale  have  any  point  in  common. 

The  proliferation  of  "time”  variables  (six,  in  this  rule)  and  all  the  nin/nax 
arithmetic  on  them  may  seem  like  an  unfortunate  feature  of  the  syntax  of 
TINT.  Certainly  macros  could  be  written  for  combinational  rules  that  capture 
the  cliche  "the  intersection  of  all  the  input  intervals  must  be  nonempty,”  as  in 
the  rule  above.  For  presentation  purposes,  this  has  not  been  done  since  there 
are  many  sequential  behavior  rules  that  defy  such  simple  categorization.  It 
was  deemed  better  to  have  one  general  and  explicit  style  of  rule  presentation 
than  to  have  multiple  incompatible  styles. 

There  are  two  other  rules  arising  from  the  behavior  definition  of  the 
adder,  not  corresponding  to  the  input/output  directionality  of  the  compo¬ 
nent.  These  will  be  called  antibehavior  rules  to  indicate  that  their  direction 
of  firing  is  "against”  that  of  causality  in  the  intended  behavior  of  the  adder. 
They  look  very  much  like  the  previous  rule,  the  difference  being  that  the 
first  one  below  makes  deductions  about  (in  0  Ta)  and  the  second  about 
(in  1  Ta): 

If  [isa  Ta  adder] 

and  [thru  Til  Tul  (node  Ta)  normal] 

and  [thru  T12  Tu2  (num  (out  0  Ta))  Tvl] 

and  (overlap  (Til  Tul)  (T12  Tu2)) 

and  [thru  T13  Tu3  (num  (in  1  Ta))  Tv2] 
and  (overlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 

Then  [thru  (max  Til  T12  T13)  (min  Tul  Tu2  Tu3) 

(num  (in  0  Ta))  (-  Tvl  Tv2)) 

*Thi*  condition  is  semantically  redundant,  but  makes  runtime  rule  matching  more 
efficient.  i 


! 
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If  [iu  Ta  adder] 

and  [thru  Til  Tul  (aoda  ?*)  normal] 

and  [thru  ?12  Tu2  (nun  (out  0  Ta))  Tvl] 

and  (overlap  (Til  Tul)  (T12  Tu2)> 

and  [thru  T13  Tu3  (nun  (in  0  ?a))  Tt2] 

and  (overlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 

Then  [thru  (aaz  Til  T12  T13)  (ain  Tul  Tu2  Tu3) 

(hub  (in  0  Ta))  (-  Tvl  Tv2)] 

Figure  5.2  shows  how  these  four  rules  concerning  the  adder  cooperate  to 
infer  signal  values,  and  how  they  interact  with  the  conciseness  condition  on 
TINT  signal  histories.  The  network  of  thru  predications  shown  was  created 
by  the  following  operations: 

1.  The  predications  [status -of  A  working]  and  [thru  1  80  (power 
(in  powar  A))  t]  are  true,  so  the  mode  rule  of  the  adder  fires  and 
results  in  the  predication  [thru  1  80  (node  A)  normal] ,  supported 
by  clause  1. 

2.  The  predications  [thru  11  50  (nua  (in  0  A))  7]  and  [thru  21  60 
(uua  (in  1  A))  12]  are  true,  so  the  behavior  rule  for  the  adder  fires, 
resulting  in  the  predication  [thru  21  50  (nua  (out  0  A) )  19]  sup¬ 
ported  by  clause  2. 

3.  The  first  of  the  antibehavior  rules  for  the  adder  fires  and  deduces  [thru 
21  50  (nua  (in  0  A))  7]  by  clause  3,  but  it  is  shadowed  (clause  4) 
by  the  enclosing  interval. 

4.  Similarly,  the  second  antibehavior  rule  fires  and  deduces  [thru  21  50 
(nua  (in  1  A))  12],  which  is  immediately  shadowed. 

Were  the  newly  deduced  intervals  not  shadowed,  the  behavior  rule  for  the 
adder  would  fire  one  more  time  to  deduce  [thru  21  50  (nua  (out  0  A) ) 
19]  again.  There  is  redundancy  in  this  scheme,  but  without  the  conciseness 
condition  on  signal  histories  it  would  be  worse. 

The  behavior  rules  for  the  adder  serve  as  a  canonical  example  of  the 
combinational  case  —  the  output  at  any  moment  is  solely  a  function  of 
its  present  inputs.  The  behavior  of  many  other  components  appearing  in 
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the  Console  Controller  Board  can  be  expressed  in  their  entirety  this  way; 
for  other  components,  portions  of  their  temporally  abstract  behaviors  are 
combinational  in  the  same  sense. 
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5.3  Sequential  Behaviors 

The  previous  examples  of  behavior  rules  involved  only  combinational  behav¬ 
iors.  Sequential  behaviors  require  introducing  signals  to  explicitly  represent 
the  internal  states  of  components.  As  with  any  program  for  reasoning  about 
change,  TINT  encounters  the  frame  problem  [McCarthy69],  or,  in  more  illu¬ 
minating  terminology,  the  initiation  and  persistence  problems  [Shoham86]. 

The  initiation  problem  arises  from  the  need  to  specify  all  the  preconditions 
for  a  given  change  or  event  to  occur.  The  solution  in  TINT  is  to  explicitly 
represent  whether  a  component  is  physically  damaged  and  conditions  on 
incoming  electrical  signals,  summarize  them  into  a  node  signal,  and  leave  all 
remaining  background  assumptions  implicit. 

The  persistence  problem  arises  from  the  need  to  specify  all  the  conditions 
under  which  nothing  happens,  that  is,  the  conditions  under  which  states  do 
not  change  over  time.  One  formal  solution  is  to  have  minimality  criteria 
(as  in  [Lifschitz87]  and  [Shoham86])  that  specify  which  of  many  possible 
extensions  of  an  initial  set  of  statements  are  preferred.  An  example  of  such 
a  minimality  criterion  is  to  prefer  extensions  that  have  the  fewest  number 
of  changes  having  no  known  cause.  The  validity  of  any  particular  prediction 
is  thus  relative  to  many  other  predictions  that  have  been  or  could  be  made. 
The  solution  in  TINT  is  to  make  explicit  the  persistence  conditions  for  each 
state.  The  result  is  a  rule  —  a  frame  axiom  —  for  every  state  signal  that 
mentions  every  kind  of  event  that  could  change  that  state. 

Neither  of  these  solutions  in  TINT  are  general,  since  both  rely  on  the 
belief  that  each  component  interacts  with  few  enough  other  components  and 
in  few  enough  ways  that  they  can  all  be  listed  explicitly.  Nevertheless,  they 
do  have  the  desirable  property  that  all  justifications  for  signal  value  predic¬ 
tions  are  grounded  solely  in  beliefs  about  the  status  of  components  and  the 
observations  of  the  troubleshooter.  Having  made  no  appeal  to  persistence 
assumptions  or  any  minimality  criteria  while  computing  the  consequences 
of  observations,  each  prediction  has  only  local  justifications  and  local  conse¬ 
quences.  There  is  thus  no  need  for  the  detection  and  manipulation  of  conflicts 
to  be  any  different  than  for  combinational  behaviors. 

A  falling-edge  triggered  register  provides  the  simplest  example  of  sequen¬ 
tial  behavior,  involving  only  three  rules.  The  first  rule  says  that  (a)  the 
output  of  the  register  is  identical  to  its  state,  and  that  (b)  changes  from  1  to 
0  on  the  clock  input  are  "interesting:” 
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If  [isa  Tr  register] 

Then  [tsaae  -oo  +oo  (stete  Tr)  (nua  (out  0  Tr))3 
and  [interesting-event  (11  (in  dk  Tr))  (1  0)] 

The  value  of  the  abstract  signal  (event  Tfroa  ?to  ?s)  is  t  whenever 
there  has  been  a  change  from  the  value  Tfroa  to  Tto.  The  value  of  this 
abstract  signal  is  recorded  explicitly  only  when  that  event  type  is  marked  as 
"interesting.”  Further  details  will  be  presented  shortly. 

The  second  rule  is  a  state-transition  rule.  Any  change  from  1  to  0  on  the 
clock  input  causes  the  register  to  enter  the  state  selected  by  its  data  input 
signal  (nua  (input  0  Tr)  ) .  The  previous  state  of  the  register  is  irrelevant. 
The  rule  below  concludes  that  during  (at  least)  the  single  moment  succeeding 
the  transition,  state  had  the  value  Tinput: 

If  [isa  Tr  register] 
and  [thru  Til  Tul  (mode  Tr)  normal] 
and  [thru  T12  Tu2  (event  1  0  (11  (in  dk  Tr)))  t] 
and  (overlap  (Til  Tul)  (T12  Tu2)) 
and  [thru  T13  Tu3  (nua  (in  0  Tr))  Tinput] 
and  (overlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 

Then  [thru  (♦  6  Tu2)  (♦  6  Tu2)  (state  Tr)  Tinput] 

The  third  rule  is  a  persistence  rule.  The  register  stays  in  whatever  state 
it  is  in  so  long  as  there  has  been  no  change  of  the  clock  from  1  to  0.  Its  state 
persists  while  the  event  is  occurring  as  well,  hence  the  appearance  of  S  in  the 
conclusion: 

If  [isa  Tr  register] 
and  [thru  Til  Tul  (mode  Tr)  normal] 
and  [thru  T12  Tu2  (event  1  0  (11  (in  dk  Tr)))  nil] 
and  (overlap  (Til  Tul)  (T12  Tu2)) 
and  [thru  T13  Tu3  (state  Tr)  Tstate] 
and  (<-  (max  TU  T12)  T13  (ain  Tul  Tu2)) 
and  (not  (and  («  T13  (max  Til  T12)) 

(■  Tu3  (+  S  (ain  Tul  Tu2))))) 

Then  [thru  (max  Til  T12)  (+  S  (ain  Tul  Tu2)) 

(state  Tr)  Tstate] 
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Figure  5.3  shows  these  behavior  rules  in  use.  The  signal  denoted 
(11  (in  elk  R))  is  the  clock  input  to  a  register  R,  and  has  a  history  of 
values  1,  then  0,  then  1.  The  predications  and  clauses  were  constructed  by 
the  following  steps: 

1.  Because  the  change  from  1  to  0  has  been  deemed  interesting,  the 
predication  [thru  2  3  (11  (in  elk  R)>  1]  results  in  clause  1  be¬ 
ing  installed,  and  similarly  for  the  clauses  2,  4,  and  5.  Clause  6 
is  installed  to  enforce  the  conciseness  condition  on  the  history  of 
(event  1  0  (11  (in  elk  R)  )),  and  this  subsequently  results  in  some 
predications  being  shadowed. 

2.  The  transition  rule  for  registers  fires  and  results  in  clause  3  being  in¬ 
stalled:  the  node  of  the  register  was  normal,  the  value  at  (in  0  R)  was 
known,  and  a  falling  edge  occurred  on  the  clock  input.  The  conclusion 
of  the  rule  is  [thru  5  5  (state  R)  9] . 

3.  The  persistence  rule  then  fires  to  create  clause  7  and  the  predica¬ 
tion  [thru  5  9  (state  R)  9],  which  in  turn  shadows  [thru  5  5 
(state  R)  9]  to  ensure  conciseness. 

In  general,  transition  rules  deduce  that  a  component  must  have  been  in  a 
state  for  just  one  moment,  and  the  persistence  rules  subsequently  deduce  how 
long  that  state  must  have  lasted.  The  rules  for  the  register  are  particularly 
simple  because  at  a  transition  the  previous  state  of  the  register  does  not 
matter;  later  examples  consider  cases  where  it  does. 
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5.4  Abstractions 

The  notion  of  an  “abstraction”  takes  on  a  specific  meaning  in  TINT  as  a 
function  from  signals  to  signals.  Behaviors  are  functions  from  signals  to  sig¬ 
nals  too,  for  example,  tinvart  represents  the  behavior  of  a  boolean  inverter. 
Abstractions  and  behaviors  are  not  syntactically  identical  in  TINT  by  acci¬ 
dent.  Their  similarity  helps  to  illuminate  the  relationship  between  precision 
and  strength  in  behavior  prediction.  Given  any  abstraction  A  and  behavior 
B  we  can  define  a  function  AB  that  describes  the  abstracted  behavior  (Fig¬ 
ure  5.4).  AB  will  usually  be  a  partial  function.  As  long  as  A  is  not  a  one-to-one 
function,  the  predictions  made  by  AB  must  be  less  precise  than  those  by  B. 
Fidelity  requires  that  any  prediction  made  by  AB  must  be  the  same  as  that 
made  by  B;  that  is,  let  x  ■■  (B  x  y)  and  then  (A  z)  ■■  (A  (B  x  y) ) : 

For  all  tines, 

If  ((AB  (A  x)  (A  y))  tine)  is  defined 

Then  ((A  (B  x  y) )  tine)  ■«  ((AB  (A  x)  (A  y))  tine) 


Figure  5.4:  Abstractions  and  Behaviors 
(A  x)  (A  y)  ^  (AB  (A  x)  (A  y)> 


The  strength  of  AB  can  be  characterized  by  the  degree  to  which  AB  is 
a  total  function.  Ideally,  any  prediction  made  by  (A  (B. . .))  will  also  be 
made  by  AB,  as  stated  below;  weakness  just  means  that  there  are  fewer  values 
of  x  and  y  for  which  it  holds: 

For  all  tines, 

If  ((A  (B  x  y))  tine)  is  defined 

Then  ((A  (B  x  y))  tine)  ■■  ((AB  (A  x)  (A  y))  tine) 


108 


CHAPTER  5.  REPRESENTING  CIRCUIT  BEHAVIOR 


For  example,  let  A  be  the  sign  function  that  maps  real  numbers  to 
{- ,0,+},  and  let  B  be  real  addition.  AB  is  the  qualitative  addition  func¬ 
tion  qplua,  which  is  partial  because  (AB  +  -)  and  (AB  -  +)  are  undefined 
(Figure  5.5).  AB  in  this  case  does  not  yield  strong  predictions. 


Figure  5.5:  Example  of  Abstractions  and  Behaviors 


(sign  x)  (sign  y) 


qplus »  (qplus  (sign  x)  (sign  y)) 


sign 


X 


(sign  z) 
sign 


(sign  (plus  x  y)) 


Any  behavior  can  be  abstracted  using  any  abstraction.  Moreover,  there  is 
no  reason  that  the  same  abstraction  A  need  be  applied  to  all  the  signals  x,  y, 
and  z.  However,  for  an  arbitrary  combination  of  behavior  and  abstractions, 
any  function  AB  is  unlikely  to  be  strong  —  that  is,  its  result  will  be  usually 
undefined  —  and  in  fact  nearly  always  useless.  An  alternative  is  to  make 
assumptions  about  the  relationship  between  x  and  y  such  that  AB  is  stronger 
over  the  resulting  restricted  domains.  In  the  case  of  qualitative  addition,  an 
example  would  be  to  assume  that  (sign  x)  and  (sign  y)  are  never  -,  so 
that  the  resulting  restriction  of  qualitative  addition  became  a  total  function. 

Every  behavior  B  can  also  be  abstracted  trivially  to  yield  strong  predic¬ 
tions  from  the  identity  behavior  I  ■*  (lambda  (Z)  Z).  The  "trick”  is  to 
have  the  abstraction  of  the  inputs  of  B  be  the  procedure  B  itself  (Figure  5.6). 
All  the  complexity  of  the  behavior  of  B  has  simply  been  hidden  in  the  ab¬ 
straction  of  its  inputs.  Although  this  particular  abstraction  is  silly,  it  is  just 
the  extreme  example  of  a  more  generally  useful  principle:  in  trying  to  formu¬ 
late  a  useful  behavioral  abstraction,  some  of  the  behavioral  complexity  of  B 
can  be  shifted  into  the  abstractions  to  make  AB  simple  and  strong. 

An  example  is  provided  by  the  abstracted  behavior  of  a  4-bit  counter 
that  increments  on  falling  edges  of  its  input  (Figure  5.7).  By  temporally  ab¬ 
stracting  its  input  and  carry-out  output  with  respect  to  the  number  of  falling 
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Figure  5.6:  Sufficiently  Complex  Abstraction*  Make  Any  Behavior  Trivial 


edge*  on  each  signal,  the  counter  can  be  viewed  a*  dividing  the  abstracted 
input  by  16.  The  complicated  definition  of  the  abstraction  “count  of  falling 
edges”  textually  resembles  the  definition  of  the  behavior  of  a  counter,  so  in 
this  case  it  is  in  a  quite  literal  sense  that  some  of  the  behavior  B  has  been 
shifted  into  the  abstraction  A. 

Figure  5.7:  The  Behavior  of  a  Counter  with  Respect  to  a  “Counting”  Ab¬ 
straction 


(A  x) 


A  -» 
count  of 
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edges 
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by  16  ^  (/  (A  x)  16) 

T(A  z) 
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Good  abstractions  are  not  just  reformulations  of  behaviors.  Ideally,  one 
has  a  small  collection  of  abstractions  that  are  appropriate  to  a  wide  range  of 
component  behaviors  —  appropriate  in  the  sense  of  (i)  sacrificing  precision, 
(ii)  retaining  strength,  and  (iii)  increasing  efficiency.  Given  a  particular  ab- 
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straction  function  A,  it  is  thus  an  interesting  and  relevant  question  to  ask:  for 
what  class  of  behaviors  B  it  is  possible  to  formulate  easily  computable  and 
strong  abstract  behaviors  AB,  or,  failing  that,  what  reasonable  assumptions 
can  be  made  to  strengthen  AB.  Thus  characterizing  the  class  of  behaviors  for 
which  the  abstraction  is  appropriate  is  a  concrete  way  of  characterizing  the 
utility  of  the  abstraction. 

5.4.1  Temporal  Abstractions 

Temporal  abstractions  are  abstractions  whose  definition  mentions  previous 
values  of  a  signal.  An  example  is  stay,  (stay  S)  is  true  at  time  only  if 
the  signal  S  has  the  same  value  at  (-  time  S)  and  time.  The  particular 
temporal  abstractions  to  be  shown  have  the  additional  property  useful  in 
troubleshooting  that  they  produce  signals  easy  to  observe  in  working  and 
malfunctioning  circuits.  Here,  too,  stay  is  an  example:  it  is  often  easier  to 
observe  whether  a  given  signal  is  changing  than  it  is  to  observe  the  value  or 
values  that  the  signal  is  taking  on. 

The  breadth  of  circuits  for  which  these  abstractions  are  appropriate  can 
be  briefly  characterized  as  those  with  behaviors  that  are  event-preserving 
functions  of  signals  having  known  relative  timing  relationships.  An  event  is 
a  change  in  the  value  of  a  signal.  Behaviors  are  event-preserving  to  the  extent 
that  changes  on  their  input  signals  are  reflected  as  changes  on  their  outputs 
(they  include  all  one-to-one  functions);  three  ways  that  input  signals  may 
have  a  “known  timing  relationship”  are: 

1.  Behaviors  with  single  inputs,  since  the  timing  relationship  of  a  signal 
with  itself  is  trivial. 

2.  Behaviors  with  multiple  inputs,  all  but  one  of  which  are  constant 
throughout  some  interval.  Example:  the  behavior  of  a  two-input  and- 
gate,  one  of  whose  inputs  is  known  to  be  a  constant  1  during  some 
interval. 

3.  Behaviors  with  multiple  inputs  for  which  it  can  be  assumed  there  are  no 
simultaneous  events.  Example:  the  behavior  of  a  two-input  xor-gate, 
whose  inputs  never  rise  or  fall  at  the  same  moment,  so  that  the  output 
always  changes  whenever  the  input  does.  This  is  a  particularly  strong 
assumption  to  make,  and  is  rarely  used. 
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Having  to  severely  bounded  the  claM  of  behaviors  for  which  temporal 
abstractions  are  useful,  it  is  tempting  to  conclude  that  the  corresponding 
class  of  digital  (or  other)  components  is  so  small  as  to  be  worthless.  This 
is  not  so,  because  it  is  possible  to  structurally  compose  groups  of  digital 
components  and  define  abstract  signals  in  such  a  way  that  the  behaviors  of 
the  resulting  aggregate  components  satisfy  those  tight  requirements.  Given 
that  freedom,  the  relevant  class  of  digital  circuit  structures  is  so  diverse  as 
to  defy  definition;  it  is  only  possible  to  present  examples  within  that  space. 
After  presenting  some  important  temporal  abstractions,  the  next  section  will 
be  devoted  to  just  such  examples.  These  important  temporal  abstractions 
are  change ,  duration,  sequence ,  count,  cycle ,  frequency,  and  sampling. 

•  Change  marks  events.  The  change  function  is  t  only  at  moments  when 
the  underlying  signal  has  just  changed  its  value,  otherwise  it  is  nil. 
Stay  is  the  obvious  negation  (an  example  of  the  values  of  these  signals 
over  time  is  shown  below;  it  and  others  like  it  follow  the  convention 
that  6  =  1,  and  that  the  more  abstract  the  signal  the  closer  it  appears 
to  the  top  line). 


(change  Z) 

? 

• 

t 

nil 

nil 

t 

(stay  X) 

? 

• 

nil 

t 

t 

nil 

X 

3 

4 

4 

4 

5 

tine 

0 

1 

2 

3 

4 

change  ■* 

(lambda  (S) 

(lambda  (tine) 

(not  (equal  (S  tine)  (S  (-  time  5)))))) 

Two  familiar  numeric  elaborations  of  the  change  abstraction  are  dt 
(derivative  with  respect  to  time)  and  cross  (crossings  of  a  value  v), 
defined  in  Appendix  B. 

It  is  also  useful  to  have  signals  that  are  t  whenever  a  particular 
event  has  just  occurred  and  nil  otherwise.  The  abstract  signal 
(event  ?fron  Tto  ?S)  is  t  whenever  the  underlying  signal  ?S  has 
just  changed  from  ?from  to  ?to.  For  example,  (event  500  700  S)  is 
t  where  S  has  just  changed  from  500  to  700: 


i 
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(avant  :any  500  S) 

? 

7 

nil 

nil 

nil 

nil 

t 

(avant  500  700  S) 
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nil 
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A  ?from  argument  of  :any  denotes  the  special  case  of  any  transition 
to  ?to,  which  is  useful  for  marking  the  known  beginning  of  an  interval, 
(event  :any  500  S)  is  t  at  time  6.  However,  it  is  not  known  to  be  t 
at  time  1  since  the  value  of  S  could  have  been  500  at  0. 

In  the  domain  of  troubleshooting  circuit  boards,  it  is  much  easier 
to  observe  whether  a  given  single-bit  signal  changed  or  not  during 
an  interval  of  several  seconds  than  it  is  to  observe  each  individual 
change.  The  abstraction  changing  with-respect-to  is  specifically  tai¬ 
lored  to  making  statements  about  whether  a  given  logic  level  signal 
ever  changed,  statements  that  typically  arise  from  observations  of  the 
circuit.  ( changing- wrt  ?1  ?u  ?S)  is  t  only  at  the  upper  bound  time 
?u  and  only  when  ?S  changed  at  least  once  during  the  interval  from  ?1 
to  ?u  inclusive: 


(changing -wrt  1  6  S) 

nil 

nil 

nil 

nil 

nil 

nil 
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nil 

( changing- wrt  1  3  S) 
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For  example,  if  [thru  6  6  (changing- wrt  1  6  S)  t]  is  true  it  means 
that  S  changed  at  least  once  between  times  0  and  6. 

•  Duration  indicates  how  long  a  signal  has  stayed  at  the  same  value.  The 
duration  is  defined  to  be  6  when  the  signal  has  just  changed. 


(duration  X) 

7  12  1 
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3  4  4  5 

time 
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e  Count  count*  the  number  of  events  that  have  occurred  with  respect  to 
a  window  of  fixed  width.  The  function  count-ww  takes  an  argument  n 
that  is  the  width  of  the  window  in  units  of  8,  and  a  signal  argument  S. 

(count-ww  3  S)  I?  ?  1  1  2  2  1  1  1 

Sit  nil  nil  t  t  nil  nil  t  nil 

time  I  0  I  2  3  4  5  6  7  8~ 

•  The  Sequence  abstraction  indicates  when  a  particular  string  of  (possi¬ 
bly  repeated)  values  has  appeared  contiguously  on  a  signal.  Given  a 
sequence  like  (0  1)  it  can  be  thought  of  as  a  finite  string  recognizer 
for  occurrences  of  the  regular  expression  0+  1+. 

(sequence  *  (0  1)  S)  H  nil  nil  nil  t  nil  nil  t  nil  t 

S1011001010 
time  I  0  I  2  3  4  5  6  7  8 

*  The  Cycle  abstraction  is  used  to  count  the  number  of  endings  of  a 
particular  sequence  of  values.  The  function  cydes-ww  is  simply  the 
composition  of  the  count  and  sequence  abstractions: 

(cydes-ww  3  '(0  1)  S)  II  ?  ?  ?  1  2  1  2  1  2 

S  010101010 
tine  I  012345678 

Typically,  the  larger  the  window,  the  less  relative  fluctuation  of 
the  cycle  count  over  time.  For  example,  suppose  A  and  B  are  sig¬ 
nals  that  are  just  slightly  out  of  phase,  (cycles-ww  n  ...  A)  and 
(cycles-ww  n  ...  B)  will  have  the  same  value  most  of  the  time,  and 
will  never  differ  by  more  than  1. 

(cycles-ww  8..  A)  2  2  3  2  2  3  2  2  3 

(cycles-ww  8  . .  B)  2  3  2  2  3  2  2  3  2 

(sequence  . .  A)  nil  nil  t  nil  nil  t  nil  nil  t 

(sequence  . .  B)  nil  t  nil  nil  t  nil  nil  t  nil 

time  012345678 
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The  larger  the  window,  the  leu  the  relative  difference,  and  conversely, 
the  easier  to  detect  significant  deviations  (as  for  example  the  difference 
between  a  signal  occasionally  asserted  and  one  that  is  running  at  about 
20  Khz).  By  convention,  the  window  size  is  usually  taken  to  be  1000 
times  the  expected  period  of  the  signal,  so  that  the  cycles-vw  of  a  pair 
of  signals  can  be  judged  as  equal  if  they  differ  by  no  more  than  ^%, 
that  is,  by  no  more  than  one  cycle  in  a  thousand. 

•  Frequency  is  simply  the  number  of  cycles  that  occurred  during  a  win¬ 
dow,  divided  by  the  duration  of  that  window.  The  abstraction  function 
in  yields  the  frequency  of  a  signal  with  respect  to  a  window  size  and  a 
particular  sequence  of  values.  With  a  sufficiently  large  window  relative 
to  the  cycle  time  (e.g.  1000  times  as  large),  the  result  is  an  adequate 
approximation  to  the  normal  notion  of  “frequency.” 
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Sometimes  it  is  not  necessary  to  know  the  actual  frequency  of  a  sig¬ 
nal,  but  simply  whether  the  signal  is  changing  or  not.  This  can  be 
represented  as  the  sign  of  the  frequency. 

•  The  notion  of  Sampling  is  essential  to  understanding  behavior  of  syn¬ 
chronous  systems;  here,  the  sampling  of  a  signal  refers  to  the  values 
that  the  signal  takes  on  at  certain  (usually  regularly  spaced)  moments. 
The  abstraction  function  sampls-and-hold  (abbreviated  samp)  takes 
two  argument  signals  V  and  S;  V  is  t  where  the  signal  S  is  to  sampled. 
The  value  of  sanqp  is  the  value  of  S  where  V  was  last  t: 


(lamp  V  S) 
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Note  that  the  value  of  (samp  XX)-  the  sampling  of  a  signal  with  itself 
-  at  tiaa  is  the  value  of  X  the  last  time  X  was  non-nil. 
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The  interesting  end  important  property  of  these  temporal  abstractions  is 
that  they  sacrifice  precision  without  sacrificing  the  ability  to  detect  faulty 
behavior.  In  troubleshooting  the  idea  is  to  detect  discrepancies  between  the 
observed  behavior  of  the  real  device  and  our  idealised  model  of  it;  thus  the 
predictions  of  interest  are  those  that  can  be  made  efficiently  from  what  we 
have  observed  and  that  could  be  significantly  violated  if  the  device  were  bro¬ 
ken.  The  change  abstraction  is  useful  because  it  is  easy  to  observe  whether 
signals  in  a  device  are  changing  or  not,  and  easy  to  predict  what  the  con¬ 
sequences  of  change  (or  lack  of  it)  would  be.  Similarly,  the  frequency  ab¬ 
straction  is  useful  even  if  frequencies  are  hard  to  observe  accurately:  the 
distinction  between  sero  and  nonsero  frequencies  is  easy  to  observe  and  is 
likely  to  result  in  significantly  different  behavioral  consequences.  By  summa¬ 
rizing  (possibly  very  long)  sequences  of  events,  temporal  abstractions  make 
complex  behaviors  look  simple  enough  for  troubleshooting  to  be  tractable. 

Abstractions  define  how  a  signal  such  as  (11  n48)  (the  logic  level  at 
node  48)  relates  to  signals  “below”  it  such  as  (voltage  n48),  and  signals 
"above”  it  such  as  (fwv  10*  *  (0  1)  (11  n48))  (the  frequency  at  node  48, 
measured  at  cycles  starting  with  0  and  with  a  window  of  10s  6  time  units). 
Abstractions  thus  result  in  rules  that  can  fire  "upward,”  "downward,”  or 
even  "sideways”  between  different  abstractions  of  the  same  base  signal.  The 
definitions  of  change  and  stay,  for  example,  can  yield  the  following  rules: 

If  [thru  ?1  ?u  (stay  ?s)  Tv) 

Then  [thru  ?1  ?u  (change  ?s)  (not  Tv)] 

If  [thru  T1  Tu  (change  Ts)  Tv] 

Then  [thru  T1  Tu  (stay  Ts)  (not  Tv)] 

In  practice,  however,  only  a  subset  of  the  possible  rules  should  actually 
be  made  explicit  and  included  in  the  program.  For  example,  only  one  of  the 
signals  (change  S)  and  (stay  S)  really  needs  to  be  represented  explicitly, 
so  these  two  rules  are  not  necessary. 

Furthermore,  each  rule  should  not  be  fired  on  every  signal  —  for  example, 
the  change  abstraction  applies  in  principle  to  every  signal,  but  if  every  change 
of  value  on  the  signal  S  required  an  explicit  deduction  about  (change  S), 
an  infinite  regress  would  result  —  (change  (change  S)),  and  so  forth.  The 
changing-wrt  abstraction  is  an  example  of  the  general  phenomenon  that  for 
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any  given  signal  there  are  infinitely  many  abstractions  that  can  applied  to  it. 
For  example,  (changing- wrt  i  5  S),  (changing-art  2  7  S),  and  so  on, 
are  all  legitimate  signals.  Some  criterion  is  needed  for  determining  which  of 
the  many  possible  signals  TINT  will  make  deductions  about.  Consequences 
concerning  changing-art  and  similar  abstractions  are  only  propagated  dur¬ 
ing  the  interval  during  which  observations  are  currently  being  made.  TINT 
denotes  the  interval  over  which  observations  are  currently  being  made  in 
terms  of  the  lower  and  upper  bounds  of  a  global  reference  interval.  By  con¬ 
vention  the  pseudo-signal  GR  denotes  this  global  reference  timeline;  it  is  t 
during  the  interval  that  the  circuit  is  actually  observed.  Thus,  [thru  ?a  ?z 
GR  t]  means  that  observations  are  made  with  respect  to  the  time  interval 
?a  to  7*  inclusive.  The  interval  ?a  to  7z  is  referred  to  as  the  "observation 
interval.”  The  pattern  [thru  ?a  ?x  GR  t]  appears  in  a  rule  to  ensure  that 
it  makes  its  deductions  only  during  the  current  observation  interval. 

Furthermore,  only  certain  signals  in  a  circuit  are  actually  observable. 
Again  by  convention,  the  only  observable  signals  are  taken  to  be  those  at 
the  solder  joints  of  a  circuit  board,  which  are  modeled  in  BASIL  as  the  ports 
of  wire  etches.  Ports  of  etches  are  called  holes  (chip  pins  are  placed  into 
them),  and  (hols  ?i  7s)  denotes  the  ?ith  in  etch  ?s.  For  efficiency,  the 
rules  dealing  with  observations  of  signals  are  restricted  to  making  inferences 
at  these  ports. 

The  result  of  these  conventions  and  efficiency  considerations  is  that  the 
following  four  rules  suffice  to  make  inferences  among  a  logic-level  signal  and 
its  change  and  fwv  abstractions: 

If  the  logic-level  at  a  hole  has  a  constant  value  over  the  interval  from  ?a 
to  ?*,  then  the  signal  was  never  changing  with  respect  to  a  subinterval  of 
observation: 

If  [thru  ?a  ?z  GR  t] 
and  [thru  71  ?u  (11  (hols  ?n  ?e))  ?v] 
and  (<»  71  7a  7z  ?u) 

Then  [thru  ?z  7z  ( changing- wrt  ?a  ?z  (11  (hols  ?n  ?e)))  nil] 

If  the  logic -level  at  a  hole  had  two  different  values  at  different  moments 
during  an  observation  interval  ?a  to  ?z,  then  the  signal  is  changing  with 
respect  to  that  interval: 
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If  [thru  ?a  ?x  QR  t] 
and  [thru  710  ?u0  (11  (hola  ?n  ?•))  0] 
and  (overlap  (Ta  T t)  (710  ?u0) ) 
and  [thru  711  7ul  (11  (hola  ?n  ?#))  1] 
and  (overlap  (?a  ?z)  (711  7ul)) 

Then  [thru  7s  7*  ( changing- wrt  ?a  ?x  (11  (hole  ?n  ?e)))  t] 

The  frequency  of  a  signal  with  respect  to  a  window  ?v  implies  whether 
or  not  it  should  be  changing  with  respect  to  the  observation  interval  ?a  to 
7s  (provided  that  the  window  7v  fits  within  the  observation  interval);  if  the 
frequency  is  nonzero  then  the  signal  should  be  changing,  otherwise  not: 

If  [thru  71  7u  (fvv  7w  7»eq  7e)  7#] 
and  [thru  ?a  7s  OR  t] 
and  (<■  71  7a  7s  7u) 
and  (<«  7*  (-  7s  7a)) 

Then  [thru  7z  7s  (changing- vrt  7a  ?s  ?»)  (<  0  ?f)] 

A  signal  that  is  not  changing  has  a  frequency  of  0  with  respect  to  any 
window  and  sequence.  The  following  rule  says  that  if  the  logic-level  signal 
at  a  hole  is  not  changing  during  an  observation  interval,  its  frequency  is  0 
during  that  interval.  An  additional  condition  is  that  there  must  have  been 
tome  previous  mention  of  the  frequency  of  that  logic-level  signal,  otherwise 
irrelevant  frequencies  would  be  deduced  for  many  other  signals: 

If  [thru  ?z  ?s  (changing- vrt  ?a  ?z  (11  (holo  ?n  ?•)))] 
and  [thru  ?a  ?s  GR  t] 

and3  Signal  (f*i  ?v  ?saq  (11  (hole  ?n  ?e)))  exists 
Then  [thru  ?a  7s  (fvv  ?w  ?aeq  (11  (hole  ?n  ?e)))  0] 

Finally,  a  noteworthy  relationship  that  will  appear  implicitly  in  other 
rules  is  that  a  signal  ?■  sampled  with  respect  to  some  signal  ?v  cannot  be 
changing  unless  the  underlying  signals  are: 

If  [thru  ?z  ?s  (changing-vrt  ?a  ?z  (samp  (fall  ?v)  ?•) )  t] 

Then  [thru  7s  ?s  (changing-vrt  ?a  ?s  ?■)  t] 

and  [thru  ?s  ?s  (changing-vrt  ?a  ?s  ?v)  t] 

*This  trigger  pattern  is  implemented  with  a  predicate  not  mentioned  elsewhere: 
[cohistorieal  (In  Tv  ?seq  (11  (hola  ?n  ?#)))] 
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Every  signal  has  many  possible  (avant  ...)  abstractions,  but  those 
that  will  help  other  behavior  rules  to  fire  are  worth  making  explicit  deduc¬ 
tions  about.  The  predicate  interesting-event  indicates  which  signals  these 
are.  The  predication  [interesting-event  ?s  (?from  ?to>]  means  that  if 
a  change  from  ?froa  to  ?to  occurs  on  signal  Ts,  then  (event  Tfroa  ?to  ?s) 
should  be  t: 


If  [interesting-event  ?signal  (Tfroa  Tto)] 
and  [thru  ?11  ?ul  ?signal  ?froa] 
and  [thru  T12  Tu2  ? signal  ?to] 
and  (<«  ?ul  ?12  (+  6  ?ui)) 

Then  [thru  ?12  ?12  (event  ?froa  ?to  ?signal)  t] 

When  froa  is  the  token  :any,  (event  :any  to  s)  is  t  no  matter  what 
the  previous  value  of  s  was: 

If  [interesting-event  ?signal  (:any  Tto)] 
and  [thru  ?11  ?ul  ?froa  Tsignal] 
and  [thru  T12  ?u2  Tto  Tsignal] 
and  (not  (equal  Tfroa  Tto)) 
and  (<«  Tul  T12  (♦  6  Tul)) 

Then  [thru  T12  T12  t  (event  Tfroa  Tto  Tsignal)] 

Otherwise,  (event  Tfroa  Tto  Ts)  should  be  nil.  For  any  interval  dur¬ 
ing  which  Ts  was  constant,  either  (i)  s  had  the  value  Tfroa,  in  which  case 
there  could  not  have  been  any  such  event: 

If  [interesting-event  Tsignal  (Tfroa  Tto)] 
and  (not  (eql  Tfroa  :any)) 
and  [thru  T1  Tu  Ts  Tv] 
and  (equal  Tv  Tfroa) 

Then  [thru  ?1  Tu  (event  Tfroa  Tto  Ts)  nil] 

Or4,  (ii)  Ts  had  some  value  other  than  Tfroa,  in  which  case  no  such  event 
could  have  happened  during  the  interval  starting  6  after  the  beginning  and 
ending  6  after  the  end: 

4 The  current  implementation  treats  these  two  cases  with  a  single  rule. 
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If  [interesting-event  ?signal  (?from  ?to)] 
and  (not  (eql  ?f rom  :any)) 
and  [thru  ?1  ?u  ?■  ?v] 
and  (not  (equal  ?v  ?iroa)) 

Then  [thru  (♦  5  ?1)  (♦  6  ?u)  (event  ?froa  ?to  ?a)  nil] 
(Figure  5.3  on  Page  106  showed  the  above  rules  about  events  in  use.) 

5.4.2  Composite  Abstractions 

Composite  abstractions  involve  spatial  as  well  as  temporal  abstraction.  For 
example,  an  eight-bit  parallel  signal  is  a  composite  of  eight  one-bit  logic-level 
signals.  BASIL  provides  the  predicate  [corr  . . .  ]  that  indicates  where  a 
port  corresponds  to  an  abstraction  of  one  or  more  subports.  Rules  that  con¬ 
cern  composite  signals  all  trigger  on  occurrences  of  such  correspondences.  For 
example,  [corr  ttl-power  Z  1  Y]  means  that  there  is  a  correspondence  of 
type  ttl-powsr  between  the  composite  port  Z  and  the  two  ports  X  and  Y.  The 
type  of  the  correspondence  between  the  ports  implies  one  or  more  abstraction 
relationships  between  signals  at  those  ports.  The  ttl-powsr  correspondence, 
for  example,  implies  that  the  abstract  signal  (power  Z)  is  equivalent  to  the 
signal  (one-and-zero  (11  X)  (11  Y)),  where: 

one-and-zero  ■■ 

(lambda  (A  B) 

(lambda  (time) 

(and  (sql  (A  time)  1)  (sql  (B  time)  0)))) 

A  power  input  of  t  is  just  shorthand  for  having  the  appropriate  voltage 
drop  between  the  power  and  ground  inputs  to  the  device.  The  following  ride 
says  that  if  a  component  has  power  then  its  power  and  ground  are  logic-levels 
1  and  0  respectively: 

If  [corr  ttl-powsr  (in  power  ?a)  ?p  ?g] 
and  [thru  ?1  ?u  (power  (in  power  ?a))  t] 

Then  [thru  ?1  ?u  (11  ?p)  1] 
and  [thru  ?1  ?u  (11  ?g)  0] 
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In  principle,  rules  could  be  written  to  enforce  many  relationships  between 
the  composite  signal  and  its  snbsignals;  only  a  few  concerning  the  temporal 
abstraction  frequency  have  yet  been  implemented. 

The  abstraction  two-phass-clock,  for  example,  yields  rules  relating  the 
frequency  occurring  on  the  two-phase  clock  to  the  frequencies  of  its  subsig¬ 
nals.  If  the  frequency  of  a  two-phase  clock  signal  is  nonzero  then  each  of  the 
underlying  signals  have  that  same  frequency.  Since  the  underlying  signals 
are  out  of  phase,  one  of  them  has  its  frequency  measured  with  respect  to 
the  cycle  ’  (0  1)  and  the  other  with  respect  to  *  (1  0);  which  is  which  de¬ 
pends  on  whether  the  two-phase  clock  frequency  was  measured  with  respect 
to  '(nil  t)  or  ’(t  nil): 

If  [corr  two-phas •- clock- encoding  Tclk  Tel  7c2] 
and  [thru  ?1  ?u  (fww  ?w  * (?b  ?a)  (cc  ?clk))  7f] 
and  (<  0  ?f) 

Then  [thru  71  ?u  (fww  ?w  (if  ?b  '(1  0)  ‘(0  1))  (11  Tel))  ?f] 
and  [thru  71  7u  (fww  7w  (if  7b  ' (0  1)  '  (1  0))  (11  7c2))  ?f] 

Conversely,  if  the  frequency  of  either  subsignal  is  zero  then  the  frequency 
of  the  composite  signal  is  sero  as  well: 

If  [corr  two-phass-clock- encoding  ?dk  7cl  7c2] 
and  [thru  71  ?u  (fww  7*  ' (?•  ?b)  (11  7c2))  0] 

Then  [thru  71  ?u  (fww  ?w  (if  (wql  0  7a)  '(nil  t)  ' (t  nil)) 

(cc  ?clk) )  0] 

If  [corr  two-phass-clock- encoding  7clk  7cl  7c2] 
and  [thru  71  ?u  (fww  ?w  '(?a  7b)  (11  7cl))  0] 

Then  [thru  71  7u  (fww  ?w  (if  (aql  1  ?a)  '(nil  t)  ' (t  nil)) 

(cc  7 elk))  0] 

A  similar  relationship  holds  between  a  synchronous  serial  signal  and  the 
pair  of  one-bit  logic-level  signals  that  comprise  it,  denoted  by  the  correspon¬ 
dence  clocked-serial.  In  this  case,  the  frequency  of  the  serial  signal  —  as 
measured  by  the  rate  of  zero- crossings  —  can  be  used  to  determine  whether 
the  underlying  logic-level  signals  are  changing.  The  essential  relationship  is 
that  the  frequency  of  zero  crossings  on  the  composite  signal  must  be  less 
than  the  frequency  of  the  underlying  serial  data  signal  sampled  with  respect 
to  the  clock: 
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•  If  [corr  clocked- serial  Ts  Td  Tc] , then: 


(fww  «  '(nil  t) 
(cross  0  (cs  ?s))) 


(fww  v  '(nil  t) 
(change 
(samp 

(fall  (11  ?c)) 
(11  ?d)))) 


From  the  fact  that  both  (11  ?c)  and  (11  ?d)  must  be  changing  for  (samp 
(fall  (11  ?c))  (11  ?d))  to  be  changing,  this  relationship  can  be  used  to 
form  the  following  rule,  which  says  that  if  the  frequency  of  the  composite 
signal  is  positive  during  the  observation  interval,  then  both  the  clock  and 
data  signals  are  changing: 

If  [corr  clocked- serial  ?s  ?d  ?c] 
and  [thru  ?1  ?u  (fww  ?w  '(nil  t)  (cross  0  (cs  ?s)))  ?f] 
and  (<  0  ?f) 
and  [thru  Ta  Tx  OR  t] 
and  (<■  T1  ?a  Tx  ?u) 

Then  [thru  Ta  Tx  (changing-wrt  Ta  Tx  (11  Tc))  t] 
and  [thru  Tx  Tx  (changing-wrt  Ta  Tx  (11  Td))  t] 

Conversely,  if  either  of  the  underlying  signals  are  not  changing  then  the 
frequency  of  the  abstract  signal  must  be  zero: 

If  [corr  clocked- serial  Ts  ?d  Tc] 
and  [thru  T1  ?u  (11  Td)  Tv] 

Then  [thru  ?1  Tu  (fww  Tw  '(nil  t)  (cross  0  (cs  Ts)))  0] 

If  [corr  clocked- serial  Ts  Td  Tc] 
and  [thru  T1  Tu  (11  Tc)  Tv] 

Then  [thru  T1  Tu  (fww  Tw  ’(nil  t)  (cross  0  (cs  Ts)))  0) 


A  more  complex  version  of  the  relationship  between  (cs  Ts)  and  its 
subsignals  applies  to  multi-bit  parallel  buses.  If  the  signal  on  an  n-bit  bus 
is  known  to  be  changing  in  such  a  way  that  the  different  values  it  takes  on 
include  both  values  below  and  above  2n_1,  then  the  most  significant  bit  must 
be  changing: 
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If  [corr  bus-vith-csl-and-vrl  ?bus 
?cs  ?«r  ?nsb  .  ?oth*rs] 

and  [thru  ?1  ?u  (fww  ?w  '(nil  t)  (cross  ?n  (cp  ?bus))>  ?f] 

and  (<  0  ?f) 

and  (sql  ?n  (szpt  2  (length  ?othars))) 

Then  [thru  ?z  ?z  ( changing- vrt  ?a  ?z  (11  ?mab))  t] 

The  12-bit  bus  in  the  Audio  Decoder,  for  example,  carries  12-bit  values, 
and  if  the  frequency  of  crossings  of  2n  on  that  bus  is  nonzero,  then  the  most 
significant  bit  of  the  bus  must  be  changing. 

5.4.3  Summary  of  Abstractions 

In  TINT,  abstractions  are  functions  from  signals  to  signals,  and  in  principle 
any  abstraction  can  be  applied  to  any  signal  or  signals  to  produce  yet  another 
signal.  TINT  can  represent  the  time  varying  values  of  any  of  these  signals, 
and  uses  rules  to  make  inferences  about  signals  at  other  levels  of  abstraction. 
Temporal  abstractions  are  among  the  most  useful  because  temporally  ab¬ 
stract  signals  are  among  the  easiest  for  the  troubleshooter  to  observe.  Using 
this  as  a  guiding  principle,  the  rules  that  map  between  levels  of  abstraction 
are  written  for  the  most  part  so  as  to  limit  the  inferences  about  signals  to 
those  that  are  observable. 

The  real  utility  of  the  temporally  abstract  signals,  however,  is  that  it  is 
possible  to  reason  about  the  behavior  of  circuit  components  using  them.  But 
faced  with  a  digital  circuit  and  the  above  collection  of  temporal  abstractions, 
it  is  not  always  obvious  how  the  behavior  of  the  circuit  should  be  described 
with  those  abstractions,  nor  even  which  portions  lend  themselves  to  such 
a  description.  This  model-building  process  is  not  automated,  but  can  be 
metaphorically  understood  as  “parsing”  the  circuit  schematic:  grouping  com¬ 
ponents  into  composite  structures  and  abstracting  signals,  sometimes  hiding 
them  completely.  An  essential  ingredient  of  the  parsing  is  “knowing  what 
the  circuit  is  for,”  that  is,  its  purpose.  Heavy  use  of  teleological  knowledge 
is  made  throughout  the  entire  parsing  description.  The  other  essential  ingre¬ 
dient  is  “knowing  what  the  model  is  for.”  The  model  is  for  troubleshooting, 
and  heavy  use  of  that  fact  is  made  too.  The  four  basic  principles  by  which 
behaviors  are  temporally  abstracted  are: 
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1.  Event  Preservation  —  some  component  behaviors  lend  themselves  to 
temporal  abstractions  without  modifications  or  new  assumptions. 

2.  Reduction  —  a  temporally  abstract  behavior  that  only  covers  part  of 
a  component  behavior  is  better  than  not  covering  any  at  all. 

3.  Synchronization  —  some  digital  circuits  have  signals  that  provide  tim¬ 
ing  information,  and  the  sampling  abstraction  can  simplify  the  behavior 
of  components  to  which  they  are  connected. 

4.  Encapsulation  —  after  grouping  components  together,  their  combined 
behavior  may  lend  itself  to  temporal  abstraction  using  the  previous  two 
techniques  even  if  the  individual  component  behaviors  did  not. 

These  principles  are  treated  individually  in  the  following  four  sections. 
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5.5  Event  Preservation 


A  behavior  is  event  preserving  to  the  extent  that  certain  types  of  changes  on 
its  input  signals  result  in  changes  on  its  output  signal.  All  one-to-one  func¬ 
tions  are  perfectly  event  preserving  and  result  in  abstracted  behaviors  that 
are  strong.  The  tinvert  function  that  describes  the  behavior  of  a  boolean 
inverter  is  a  simple  example  (Page  98).  The  t invert  behavior  is  event  pre¬ 
serving  because  it  is  a  one-to-one  function.  Abstracting  the  behavior  of  a 
one-to-one  function  with  the  abstraction  stay  always  results  in  the  identity 
function.  In  particular,  (stay  (tinvart  X))  ■■  (identity  (stay  X)). 

The  abstracted  behavior  identity  with  respect  to  stay  is  not  by  itself 
a  useful  result,  but  similar  derivations  apply  to  the  cyclew-ww  and  fww  ab¬ 
stractions: 


(cycles-ww  n  *  (0  1)  S) 
(cycles-ww  n  ’ (1  0)  S) 


(cycles-ww  n  '(1  0)  (tinvert  S)) 
(cycles-ww  n  '(0  1)  (tinvert  S)) 


(fww  n  ’(0  1)  S)  ■■  (fww  n  * (1  0)  (tinvert  S>) 

(fww  n  '(1  0)  S)  ■*  (fww  n  * (0  1)  (tinvert  S)) 

The  identity  behavior  that  results  from  the  fww  abstraction  is  useful  be¬ 
cause  predictions  about  the  frequencies  of  signals  can  be  made  over  long 
intervals  of  time,  summarizing  many  underlying  events  without  having  to 
refer  to  each  one  individually. 

These  relationships  between  the  logic-levels  at  the  inputs  and  outputs 
of  inverters  are  simple  to  encode  in  rules.  The  inverter  has  a  rule  (like  all 
boolean  gates)  that  says  if  it  is  working  and  it  has  power,  then  its  mode  is 
normal: 


If  [isa  ?x  inverter] 
and  [status-of  ?x  working] 
and  [thru  ?1  ?u  (power  (in  power  ?x))  t] 

Then  [thru  ?1  ?u  (mode  ?x)  normal] 

The  behavior  and  antibehavior  of  the  inverter  can  be  captured  in  two 
rules: 
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If  [isa  ?x  invsrtsr] 
and  [thru  ?11  ?ul  (mod*  ?x)  normal] 
and  [thru  ?12  ?u2  (11  (in  a  ?x)>  ?v] 
and  (overlap  (?11  ?ul)  (?12  ?u2>) 

Then  [thru  (max  ?11  ?12)  (min  ?ul  ?u2) 

(11  (out  y  ?x))  (-  i  ?v)] 

If  [isa  ?x  inverter] 
and  [thru  ?11  ?ul  (mod*  ?x)  normal] 
and  [thru  ?12  ?u2  (11  (out  y  ?x>)  ?v] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 

Then  [thru  (max  ?11  ?12)  (min  ?ul  ?u2) 

(11  (in  a  ?x))  (-  i  ?v)] 

The  behavior  of  the  inverter,  being  a  one-to-one  function,  is  event  pre¬ 
serving,  and  there  are  potentially  several  temporal  abstractions  appropriate 
for  describing  its  behavior.  Rules  could  be  written  for  the  inverter  using  the 
abstractions  change,  stay,  cycles,  fwt  and  so  forth,  but  changing-wrt  is 
chosen  because  it  refers  to  easily  observable  abstract  signals.  The  rule  about 
whether  the  signal  is  changing  simply  says  that  during  the  observation  inter¬ 
val,  the  output  is  changing  if  and  only  if  the  input  is  changing: 

If  [isa  ?x  inverter] 
and  [thru  ?1  ?u  (mode  ?x)  normal] 
and  [thru  ?a  ?z  GR  t] 
and  (<■  ?1  ?a  ?z  ?u) 

Then  [tsame  ?1  ?u  ( changing- vrt  ?a  ?z  (11  (in  a  ?x)>) 

( changing- wrt  ?a  ?z  (11  (out  y  ?x)))] 

An  inverter  can  be  used  to  implement  a  “frequency  buffer.”  The  input 
and  output  frequencies  of  a  frequency  buffer  are  the  same.  However,  when 
the  underlying  signal  has  been  inverted,  incoming  '(01)  cycles  come  out  as 
*  (1  0)  cycles,  and  the  rule  must  take  this  into  account.  The  following  rule 
says  that  the  frequency  of  the  output  with  respect  to  a  particular  cycle  is 
the  same  as  the  frequency  with  respect  to  the  inverse  of  that  cycle;  the  rule 
does  not  fire  unless  there  has  been  some  mention  of  the  relevant  input  signal 
frequency  and  cycle: 
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If  [is*  ?d  frequency-buffer] 
and  [thru  ?11  ?ul  (mods  ?d)  normal] 
and  Signal  (fn  ?«  *  (?*  ?b)  (11  (in  *  ?d)))  exists 
Then  [teams  ?11  ?ul  (few  ?w  »(?a  ?b)  (11  (in  a  ?d)>) 

(fan  ?*  *(?b  ?a)  (11  (out  y  ?d)))] 

A  pair  of  inverters  may  also  form  a  frequency  buffer  for  a  two-phase 
clock  signal.  However,  the  effect  of  the  inversion  of  the  underlying  signals  is 
to  make  the  output  cycle  start  a  quarter  phase  later  than  the  input  cycle. 
Over  a  large  number  of  cycles  the  phase  shift  makes  little  difference  in  the 
frequency.  Thus  the  rule  says  that  the  frequencies  of  the  cc  signals  at  the 
input  and  outputs  are  the  same,  provided  that  there  has  been  some  mention 
of  the  input  frequency: 

H  [isa  ?d  frequency-buffer] 
and  [thru  ?11  ?ul  (mods  ?d)  normal] 
and  Signal  (fn  ?*  '  (?a  ?b)  (cc  (in  a  ?d) ) )  exists 
Then  [tsams  ?11  ?ul  (fww  ?w  »(?a  ?b)  (cc  (in  a  ?d))) 

(fn  ?w  »(?a  ?b)  (cc  (out  y  ?d)))] 

A  larger  and  more  interesting  class  of  behaviors  than  one-to-one  functions 
are  those  for  which  a  subset  of  input  events  always  result  in  some  output 
event.  For  example,  a  toggle  is  a  flip-flop  that  changes  its  state  on  every 
falling  edge  of  its  clock  input.  This  behavior  can  be  described  with  the 
function  toggle,  which  is  event  preserving  with  respect  to  falling  edges.  Its 
input,  ranging  over  {0,  l},  has  two  possible  events  —  rising  and  falling 
edges.  In  any  sequence  of  input  events,  a  fixed  subset  (about  half)  will  be 
falling  edges.  Whenever  a  falling  edge  occurs  on  the  input,  the  output  has 
either  a  rising  or  falling  edge. 

(toggle  L)  HOllllOO 
L  j  1  0  0  1  1  0  0 

time  |  0  1  2  3  4  5  6 


It  would  be  useful  to  have  a  strong  temporally  abstract  version  of  the 
toggle  behavior;  the  problem  is  finding  a  temporal  abstraction  that  will 
work,  stay  does  not  work,  but  cycles-vw  does.  As  noted  earlier,  any 
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behavior  can  be  combined  with  any  abstraction  to  yield  an  abstracted  be¬ 
havior.  Unlike  one-to-one  functions,  partially  event  preserving  behaviors  ab¬ 
stracted  with  stay  do  not  yield  strong  functions.  For  example,  by  a  deriva¬ 
tion  similar  to  that  for  tinvert,  all  that  can  be  shown  is  that  for  all  tines, 
((stay  S)  time)  — »  ((stay  (toggle  S))  time),  that  is,  the  output  never 
changes  if  the  input  does  not.  This  is  not  strong,  because  it  makes  no  pre¬ 
diction  if  the  input  is  changing.  Partially  event  preserving  behaviors  may, 
however,  yield  strong  functions  when  abstracted  with  temporal  abstractions 
other  than  stay.  In  the  case  of  toggle  in  particular,  the  behavior  derived 
for  the  cycles-vw  abstraction  is  a  total  function,  by  using  the  additional 
fact  that  the  value  of  S  is  1  or  0  at  all  times:  the  count  of  occurrences  of  the 
sequence  l*’  (0  1)  or  l**1  (1  0)  on  the  output  is  approximately  half  that  on 
the  input*: 


(*  2  ( (cycles- w  n  1  (toggle  S))  time))  < 
((cycles-**  n  1  S)  time)  < 

(+  1  (*  2  ((cycles-**  n  1  (toggle  S))  time))) 

By  substitution  using  the  definition  of  f  **,  for  a  sufficiently  large  value 
of  n  the  following  approximate  relation  holds  at  all  times: 

(*  2  ((fvv  n  1  (toggle  S))  time))  --  ((fvv  n  1  S)  time) 

Event  preservation  is  not  a  property  solely  of  a  behavior;  if  the  behavior 
is  not  a  one-to-one  function  it  might  be  necessary  to  make  use  of  additional 
information  about  the  input  signal  to  the  function.  This  may  be  either 
through  an  assumption  about  the  signal,  or  (as  in  the  frequency  divider 
case)  through  an  intrinsic  property  the  signal  possesses  by  virtue  of  its  type. 

toggle  behaves  as  a  divider  with  respect  to  the  signal  abstraction  fvv; 
components  with  the  toggle  behavior  can  thus  be  viewed  as  frequency  di¬ 
viders.  Similarly,  cascades  of  components  having  the  toggle  behavior  — 
counters,  that  is  —  can  be  viewed  as  frequency  dividers  as  well,  for  divisions 
by  powers  of  2. 

'Briefly,  the  derivation  consider*  four  cases  on  S:  1  followed  by  1  at  tine,  1  followed 
by  0  at  tlae,  and  so  forth.  By  using  the  definition  of  toggle  in  each  case  it  can  oe  shown 
that  (oyeles-v*  n  1  (toggle  3))  must  increment  by  at  least  1  for  every  2  increments 
of  the  signal  (oyeles-w  n  1  S). 
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This  it  a  useful  way  of  viewing  the  behavior  of  toggles  and  counters  be¬ 
cause  sometimes  their  inputs  have  known  frequencies  that  are  stable  over 
long  intervals  of  time.  One  way  that  the  frequency  of  the  input  signal  could 
be  known  over  a  long  interval  is  if  it  the  output  of  an  oscillator.  For  exam¬ 
ple,  the  crystal  oscillator  in  the  Console  Controller  Board  generates  a  9.8Mhz 
signal.  This  is  approximated  as  a  frequency  of  10r  cycles  per  second,  with  a 
window  size  of  a  thousand  periods,  that  is,  1000  x  jiy  seconds: 

If  [isa  ?o  oscillator] 
and  [thru  ?1  ?u  (node  ?o)  noratal] 

Then  [thru  ?1  ?u  (fww  10“4sec  * (0  1)  (11  (out  0  ?o)))  10e] 

The  behavior  of  the  frequency  divider  allows  the  program  to  predict  what 
the  output  frequency  will  be  over  those  same  long  intervals  of  time.  Express¬ 
ing  its  behavior  in  rules  introduces  some  subtleties. 

The  first  subtlety  is  that  until  now  “power”  has  been  the  only  input 
that  components  required  to  be  in  normal  mode.  The  frequency  divider 
requires  a  separate  constant  1  input.  For  example,  the  Console  Controller 
Board  contains  several  frequency  dividers,  implemented  with  one  or  more  JK 
flipflops  or  with  counters,  and  one  thing  they  all  have  in  common  is  that  they 
have  some  of  their  inputs  pulled  up  to  a  constant  logic-level  of  1.  Figure  5.8 
shows  an  example;  the  input  (in  hi  FD)  is  tied  to  several  JK  fiipflop  inputs. 
With  both  J  and  K  tied  to  1,  the  fiipflop  toggles  its  state  on  each  falling  clock 
edge,  and  with  the  Preset  and  Clear  inputs  tied  to  1  this  is  the  only  way  it 
can  change  its  state.  The  rule  for  the  mod*  of  the  frequency  divider  thus 
includes  the  condition  that  the  input  (11  (in  hi  ?d))  must  be  1: 

If  [iaa  ?d  fraquancy-dividar] 
and  [statua-of  ?d  working] 
and  [thru  ?11  ?ul  (power  (in  power  ?d)>  t] 
and  [thru  ?12  ?u2  (11  (in  hi  ?d))  i] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 

Then  [thru  (max  ?11  ?12)  (min  ?ul  ?u2)  (mode  ?d)  normal] 

The  second  subtlety  is  that  frequency  dividers  can  be  composed  of  a  cas¬ 
cade  of  toggle  behaviors  (a  ripple  counter  can  be  viewed  this  way)  and  hence 
have  multiple  outputs,  which  by  convention  are  numbered  from  0  upwards. 
The  frequency  at  the  nth  output  is  thus  that  of  the  input. 
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Figure  5.8:  Frequency  Divider  Implemented  with  JKFFs 


The  third  and  final  subtlety  is  that  signals  at  lower  frequencies  have  longer 
periods  and  hence  require  a  longer  duration  to  go  through  1000  cycles;  the 
effect  is  that  the  window  size  at  the  nth  output  of  a  frequency  divider  scales 
by  2n+1.  As  a  result,  a  single  behavior  rule  for  frequency  dividers  works  for 
any  number  of  output  ports,  and  makes  deductions  at  different  window  sizes 
on  those  different  ports: 
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If  [is*  ?d  fraquwncy-dividwr] 
and  [haw-port  ?d  (out  ?n  ?d)] 
and  [thru  ?10  ?u0  normal  (mod#  ?d)] 
and  [thru  ?11  ?ul  ?f  (fww  ?w  ?cyc  (11  (in  a  ?d)))] 
and  (ovwrlap  (?10  ?u0)  (?li  ?ul)) 

Then  [thru  (mar  ?11  ?12)  (min  ?ul  ?u2) 

(fww  (truncata  (*  ?*  (azpt  2  (+  1  ?n)))) 

?cyc  (11  (out  ?n  ?d))) 

(/  ?f  (azpt  2  (+  i  ?n)))] 

The  antibehavior  rule  of  the  frequency  divider  is  similar;  the  frequency 
at  input  a  is  a  multiple  of  that  at  any  output  and  the  window  size  of  mea¬ 
surement  is  a  corresponding  fraction. 

Behaviorally,  two-phase  clock  generators  can  be  viewed  as  frequency  di¬ 
viders  restricted  to  a  single  output  that  is  a  two-phase  clock;  indeed  the  same 
physical  component  may  be  part  of  both  a  frequency  divider  and  of  a  two- 
phase  clock  generator.  Their  behavior  rule  says  that  the  output  frequency 
is  half  that  of  the  input,  measured  with  a  window  size  twice  the  size  of  the 
input: 

If  [isa  ?c  two-phasa-clock-gwnwrator] 
and  [thru  ?11  ?ul  (mod*  ?c)  normal] 
and  [thru  ?12  ?u2  (fww  ?w  '(0  1)  (11  (in  a  ?c)))  ?f] 
and  (overlap  (?11  ?ul)  (T12  ?u2)) 

Then  [thru  (max  ?11  ?12)  (min  ?ul  ?u2) 

(fww  (*  2  ?w)  ’(nil  t)  (cc  (out  y  ?c))) 

(/  ?f  2)] 

The  ordinary  behaviors  of  inverters  and  toggles  in  terms  of  moment-by¬ 
moment  changes  of  the  logic  levels  at  their  inputs  and  outputs  can  be  de¬ 
scribed  using  TINT  rules.  Rules  can  also  describe  their  behavior  in  terms 
of  whether  those  signals  are  changing  or  not  and  what  their  frequencies  are. 
Because  these  behaviors  are  event  preserving,  the  rules  and  resulting  predic¬ 
tions  are  strong.  Not  all  behaviors  are  event-preserving,  however;  the  next 
three  sections  present  ways  of  using  temporal  abstractions  in  more  general 
situations. 
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5.6  Reduction 

Any  function  of  n  inputs  with  one  of  its  inputs  held  constant  yields  a  new 
function  of  n  —  1  inputs,  and  this  fact  can  be  used  to  form  a  temporally 
abstracted  behavior  for  a  multiple  input  behavior  under  the  special  case  of 
its  having  one  or  more  constant  inputs.  The  resulting  behavior  is  incomplete, 
of  course,  in  the  sense  that  it  does  not  cover  cases  in  which  the  inputs  are 
not  constant.  It  is  nevertheless  worthwhile  because  it  provides  an  alternative 
to  the  undesirable  option  of  predicting  all  behavior  at  a  temporally  detailed 
level:  weak  temporally  abstract  predictions  are  better  than  none. 

A  simple  example  is  the  behavior  of  a  two-input  AND  gate  (denoted 
tand2)  under  the  special  case  where  one  of  its  inputs  is  the  constant  signal 
(lambda  (time)  1). 

tand2  ■■ 

(lambda  (A  B)  (lambda  (time)  (logand  (A  time)  (B  time)))) 

A  straightforward  derivation  uses  the  fact  that  (logand  1  z)  z  to 
show  that  if  X  ■■  (lambda  (time)  1)  then  (tand2  X  Y)  -■  Y. 

The  rules  for  the  two-input  AND  gate  (component  type  and2)  are  shown 
here;  the  pattern  for  OR,  NAND,  NOR,  XOR,  and  so  forth  should  be  rela¬ 
tively  clear  from  these  examples.  It  is  tedious  but  straightforward  to  write 
separate  rules  for  gates  of  the  same  type  but  with  different  arities. 

If  any  input  of  an  AND  gate  is  0  then  the  output  is  0: 

If  [isa  ?x  and2] 

and  [thru  ?11  ?ul  (mode  ?x)  normal] 
and  [thru  ?12  ?u2  (11  (in  ?n  ?x))  0] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 

Then  [thru  (max  ?11  ?12)  (min  ?ui  *u2)  (11  (out  y  ?x))  0] 

Note  that  one  of  the  considerations  in  translating  the  definitions  into 
rules  is  that  the  rules  should  be  written  in  such  a  way  as  to  reference  the 
minimal  sets  of  facts  needed  to  make  their  conclusions.  Hence  sometimes 
several  rules  will  be  used  to  represent  a  behavior  that  was  captured  with  a 
single  function.  This  is  because  the  troubleshooting  engine  will  examine  the 
dependencies  left  by  the  rules  to  determine  which  components  could  have 
been  responsible  for  observed  symptoms.  Spurious  dependencies  make  the 
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troubleshooting  engine  waste  effort  working  on  components  that  could  not 
in  fact  have  caused  the  symptoms. 

The  antibehavior  rule  for  the  AND -gate  says  that  if  the  output  is  1  then 
all  of  the  inputs  must  be  1: 

If  [isa  ?x  and2] 

and  [thru  ?11  ?ul  (node  ?x)  normal] 
and  [thru  ?12  ?u2  (11  (out  y  ?x>)  1 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [hM-port  ?x  (in  ?n  ?x)] 

Then  [thru  (max  ?11  ?12)  (min  ?ui  ?u2) 

(11  (in  ?n  ?x)>  1], 

Another  rule  for  the  AND  gate  says  that  with  all  but  one  of  its  inputs 
held  to  1,  it  acts  as  a  buffer.  In  the  two-input  case,  this  means  that  as  long 
as  input  ?n  is  1  the  output  is  the  same  as  input  (-  1  ?n): 

If  [isa  ?x  and2] 

and  [thru  ?10  ?u0  (mods  ?x)  normal] 
and  [thru  ?11  ?ul  (11  (in  ?n  ?x))  1] 
and  (ovarlap  (?10  ?u0)  (?11  ?ul)) 

Then  [tsame  (max  ?10  ?11)  (min  ?u0  ?ul) 

(11  (in  (-  1  ?n)  ?x))  (11  (out  y  ?x))] 

The  latter  rule  is  interesting  because  the  identity  between  the  output 
and  free  input  will  have  consequences  for  any  abstraction  of  either  signal, 
including  temporal  abstractions.  The  behavior  of  the  AND  gate  with  all 
but  one  of  its  inputs  1  is  one-to-one  function  that  is  event  preserving  just 
like  the  inverter.  Similarly  the  behavior  of  a  NAND  gate  when  all  but  one 
of  its  inputs  is  1  is  just  that  of  an  inverter.  Hence  the  temporally  abstract 
version  of  the  NAND  gate  refers  to  the  ehanging-vrt  abstraction  as  does 
the  inverter  rule: 
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If  [lit  ?x  nand2] 

and  [thru  ?11  ?ul  (mode  ?x)  normal]  , 
and  [thru  ?12  ?u2  (11  (in  ?n  ?x))  1] 
and  [thru  ?a  ?x  OR  t] 

and  (<«  (max  ?11  ?12)  ?a  ?x  (min  ?ul  ?u2)> 

Then  [teams  (max  ?11  ?12)  (min  ?ul  ?u2) 

(changing-vrt  ?a  Tx  (11  (in  (-  1  ?n)  ?x))> 

( changing- wrt  ?a  ?x  (11  (out  y  ?x))>] 

At  with  the  inverter,  any  of  the  abstraction!  change,  stay,  cycles,  or 
few  could  have  been  chosen,  but  ehanging-vrt  refers  to  easily  observable 
abstract  signals. 

The  behavior  rules  of  other  components  in  the  Console  Controller  Board 
are  similarly  written  in  a  style  that  makes  explicit  the  event-preserving  sub¬ 
sets  of  their  behavior.  For  example,  the  behavior  of  a  JK  flip-flop  with  all  but 
its  clock  input  held  to  1  becomes  toggle,  which  as  discussed  earlier  is  par¬ 
tially  event  preserving.  Also,  multiplexors  are  much  like  buffers,  once  their 
■elect  input  is  known.  Their  principal  behavior  rule  equates  the  output 
with  whichever  input  signal  is  selected: 

If  [isa  ?m  multiplexor] 
and  [thru  ?11  ?ul  (mode  ?m)  normal] 
and  [thru  ?12  ?u2  (11  (in  select  ?m))  ?■] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 

Then6  [tsame  (nar  ?11  ?12)  (min  ?ul  ?u2) 

(in  ?■  ?m)  (out  y  ?m)] 

In  general  by  considering  the  special  case  of  one  or  more  input  signals 
constant,  most  behaviors  can  be  reduced  to  an  event-preserving  behavior. 
These  restricted  temporally  abstract  behaviors  are  ubiquitous  in  the  model 
of  the  Console  Controller  Board. 


#The  predicate  tsaae  can  be  used  with  ports  as  its  third  and  fourth  arguments,  not 
just  signals. 
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I 

5.7  Synchronization 

As  discussed  earlier,  temporal  abstractions  are  useful  for  behaviors  whose 
I  inputs  have  known  relative  timing  relationships.  An  important  special  case 

I  of  “known  relative  timing  relationship”  occurs  when  the  input  signal  of  some 

behavior  is  a  clock  whose  transitions  indicate  when  the  other  input  signals 
are  to  be  sampled.  In  that  case,  the  saap  abstraction  can  be  used  to  form  an 
abstracted  behavior  that  is  more  strongly  event-preserving  than  the  original. 
In  this  section  the  idea  will  be  used  to  derive  a  temporally  abstract  behavior 
for  a  shift  register,  starting  from  the  behavior  of  an  ordinary  register. 

The  behavior  register  describes  the  behavior  of  a  falling-edge  trig¬ 
gered  register;  the  falling  edges  of  the  clock  input  C  capture  the  data  D. 
syn- register,  the  abstracted  version  of  the  register  behavior,  captures 
the  intuition  that  a  register  introduces  a  one-dock  delay.  Figure  5.9  shows 
the  relationships  between  the  various  signals. 

Figure  5.9:  Register  Abstractions 


syn-register  behavior 


register  behavior 


Forming  the  abstracted  version  syn-register  involves  several  steps. 
First,  the  clock  signal  is  abstracted  with  fall.  Second,  saap  is  used  to 
abstract  both  the  output  and  D  input  with  respect  to  the  fallings  of  the 
clock.  Third,  the  function  synchronous -delay  generalizes  saap  by  allowing 
for  arbitrary  delays.  Finally,  the  resulting  abstracted  behavior  for  the  register 
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(syn-register)  will  be  easily  expressible  using  synchronous-delay. 

An  example  to  give  some  intuition  behind  these  signals  is  given  below. 
The  values  at  times  2  and  6,  when  data  are  latched  into  the  register,  are 
most  important.  “10”  is  latched  into  the  register  and  then  “5”: 


(syn-del  1 

(samp  (fall  c)  d)) 

? 

? 

? 

7 

7 

10 

10 

10 

10 

5 

(samp  (fall  c)  d) 

? 

10 

10 

10 

10 

5 

5 

5 

5 

5 

q  ■■  (register  c  d) 

? 

• 

10 

10 

10 

10 

5 

5 

5 

5 

5 

d 

9 

10 

5 

6 

5 

5 

5 

4 

4 

5 

(fall  c) 

nil 

t 

nil 

nil 

nil 

t 

nil 

nil 

nil 

t 

c 

1 

0 

0 

1 

1 

0 

0 

1 

1 

0 

time 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

The  definition  for  synchronous-delay  (abbreviated  syn-del)  resembles 
that  for  samp,  and  in  fact  a  delay  of  0  is  the  same  as  sampling,  that  is, 
(synchronous-delay  0  V  S)  «■  (samp  V  S).  The  abstracted  register  be¬ 
havior  is  then  simply  “a  delay  of  one  clock.” 

The  point  of  expressing  the  behavior  of  the  register  using  the  sampling 
abstraction  is  that  the  resulting  behavior  is  more  strongly  event  preserv¬ 
ing  than  the  lower  level  register  behavior.  In  particular,  register  does 
not  preserve  every  change  in  the  value  of  the  input  signal  D;  in  the  exam¬ 
ple  above,  d  changed  from  5  to  6  and  back  to  5  in  between  falling  edges 
of  the  clock,  hence  those  changes  were  not  reflected  on  the  output.  The 
synchronous-delay  function  —  and  hence  the  syn-register  behavior  of 
which  it  is  a  special  case  —  is  mostly  event  preserving,  even  though  it  is  not 
one-to-one.  The  result  is  that  the  following  inequality  holds  at  all  times  for 
any  signals  V  and  S:  the  number  of  changes  at  the  output  (sampled  at  V)  is 
within  1  of  the  number  of  changes  at  the  input: 

( (count-**  n  (change  (samp  VS)))  time)  < 
((count-**  n  (change  (syn-register  VS)))  time)  < 

(+  1  ((count-**  n  (change  (samp  VS)))  time)) 
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This  relation* hip  can  now  be  used  to  derive  the  temporally  abstract  shift 
register  behavior.  A  shift  register  configured  to  convert  serial  data  to  parallel 
can  be  viewed  as  a  cascade  of  one- bit  registers  all  sharing  a  common  clock 
input.  Figure  5.10  shows  the  signals  a’,  dO',  dl*,  and  d2*;  the  components 
labeled  ayn-reg  compute  dO*  as  a  function  of  a  ’ ,  dl  ’  as  a  function  of  dO*, 
and  so  forth: 


Figure  5.10:  Shift  Register  as  Cascade 


s’  »■  (samp  (fall  c)  a) 
dO’  ■■  (samp  (fall  c)  dO) 
dl*  ■■  (a asp  (fall  c)  dl) 
d2*  ■»  (a amp  (fall  c)  d2) 

IT 


The  behavior  of  a  As-bit  shift  register  can  be  expressed  with  respect  to  a 
sampling  signal  as  follows: 

syn- shift-register  »■ 

(lambda  (k  V  S)  (synchronous-delay  k  V  S)) 

Hence  the  temporally  abstract  behavior  of  a  As-bit  shift  register  is  simply 
a  variation  of  the  inequality  shown  above  for  ayn-regiater;  the  number  of 
changes  that  appear  on  the  synchronous  output  is  within  k  of  the  number  of 
changes  on  the  synchronous  input: 
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( (count-ww  n 

(change  (syn-ehift-register  k  V  S)))  time)  ~ 

( (count-ww  n  (change  (samp  VS)))  tine)  < 

(+  k  ((count-ww 

(change  (eyn-ahift-register  k  V  S)))  tine)) 

One  of  the  consequence*  of  this  relationship  is  that  if  the  incoming  signal 
to  the  register  has  a  large  enough  frequency  over  a  large  enough  interval, 
then  the  output  signals  will  have  positive  frequencies  as  well.  Suppose  it  is 
known  that  over  some  time  interval,  the  frequency  of  changes  on  a  signal  was 
(strictly)  bounded  below  by  a  positive  frequency  i; 

j<  ((fit  w  '(nil  t)  (change  (samp  VS)))  tine) 

Then  the  number  of  changes  during  any  window  must  be  at  least  1: 

1  <  ( (count-ww  w  '(nil  t)  (change  (samp  V  S)))  tine) 

Hence  for  any  k  >  1  the  number  of  changes  during  a  window  of  size 
k  x  w  is  at  least  k  (provided  that  k  x  w  does  not  get  bigger  than  the 
interval  during  which  the  frequency  was  known): 

,  ( (count-ww  (*  k  w)  ' (nil  t) 

(change  (samp  VS)))  time) 

Using  the  previously  derived  bounds  on  the  number  of  changes  on  the 
outputs  of  the  shift  register,  the  fcth  output  must  have  at  least  one  change: 

Q  ((count-ww  (*  k  w) 

(change  (syn-shift-register  k  V  S)))  time) 

This  derivation  and  its  conditions  can  be  summarized  into  a  single  rela¬ 
tionship.  If  the  incoming  signal  of  the  register  has  a  large  enough  frequency 
over  a  large  enough  interval,  the  output  signals  will  have  positive  frequencies 
as  well;  the  relationship  below  makes  “large  enough”  precise: 

•  If  (fww  tv  ’(nil  t)  (change  V  S))  is  always  >  £  from  time  Z  to 

«,  and  k  <  then  from  time  /  -I-  Aster  to  «,  (count-ww  (*  k  w) 

(change  (syn-ehift-register  k  V  S) ) )  is  always  >  0. 
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The  Audio  Decoder  contains  two  shift  registers  that  accumulate  incoming 
serial  data  bits.  Shift  registers  used  in  this  fashion  are  referred  to  here  as 
clocked  aerial  accumulators  and  the  rules  of  their  behavior  are  based  on  these 
relationships.  The  temporally  abstract  behavior  of  a  clocked  serial  accumu¬ 
lator  yields  the  following  rules.  The  first  rule  infers  from  the  fact  that  the 
incoming  byte  stream  is  changing  that  all  of  the  output  data  bits  must  be 
changing: 

If  [isa  ?csa  clocked- ■•rial-accumulator] 
and  [thru  ?11  ?ul  (mode  ?csa)  normal] 
and  [thru  ?12  ?u2 

(few  ?v  ’(nil  t)  (cross  0  (cs  (in  a  ?csa))))  ?f] 
and  (<  (/  1  ?w)  ?f) 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [has-port  ?csa  (out  ?k  ?csa)] 
and  (<  ?k  (/  (-  (min  ?ul  ?u2)  (max  ?U  ?12))  ?v)) 
and  [thru  ?a  ?z  GR  t] 

and  (<-  (+  (max  ?11  ?12)  (*  ?k  ?*))  ?a  ?z  (min  ?ul  ?u2)) 
Then  [thru  ?z  ?z  (changing- vrt  ?a  ?z  (11  (out  ?k  ?csa)))  t] 

The  second  rule  is  inherited  from  an  ordinary  shift  register.  All  but  the 
last  output  of  a  k-bit  shift  register  is  an  input  to  the  next  stage;  hence  the 
same  relationship  holds  between  output  k  and  output  k  +  j  as  held  between 
the  input  and  output  j\  in  particular,  a  changing  input  implies  a  changing 
output,  and  vice  versa.  The  following  rule  captures  the  fact  that  if  output  k 
is  observed  to  be  changing  then  output  k  -i- 1  will,  too: 

If  [isa  ?csa  clocked-serial-accumulator] 
and  [thru  ?1  ?u  (mode  ?csa)  normal] 
and  [thru  ?a  ?z  gr  t] 
and  (<-  ?1  ?a  ?z  ?u) 

and  [thru  ?z  ?z  (changing-vrt  ?a  ?z  (out  ?k  ?csa))  t] 
and  [has-port  ?csa  (out  (+  ?k  1)  ?csa)] 

Then  [thru  ?z  ?z 

(changing-vrt  ?a  ?z  (out  (+  ?k  1)  ?csa))  t] 

The  point  of  using  temporal  abstractions  is  to  be  able  to  make  predictions 
about  component  behaviors  using  simple  observations.  In  this  case,  there  is  & 
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strong  relationship  between  the  number  of  changes  of  value  on  the  input  and 
outputs  of  the  shift  register.  If  more  than  k  changes  are  observed  at  the  input 
to  the  shift  register,  the  temporally  abstract  behavior  can  derive  bounds 
on  the  number  of  changes  that  should  be  observed  at  its  output  without 
requiring  clock-by-clock  reasoning.  What  made  it  possible  in  this  example 
was  the  sampling  abstraction,  which  allowed  us  to  represent  synchronous 
signals  and  thereby  describe  the  behavior  of  a  register  as  a  component  that 
introduces  a  delay  between  signals. 
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5.8  Encapsulation 

Sequential  circuit*  are  more  difficult  to  reason  about  than  combinational  cir¬ 
cuits.  In  general,  predicting  the  response  to  a  particular  sequence  of  stimuli 
may  require  explicitly  representing  every  intervening  state  change.  The  more 
complex  the  circuit  behavior  —  that  is,  the  more  distinguishable  states  that 
the  circuit  can  be  in  —  the  greater  the  need  for  temporal  abstractions  to 
simplify  that  reasoning.  Up  to  this  point  in  the  discussion,  examples  of  tem¬ 
porally  abstract  behaviors  have  all  been  either  combinational  circuits  or  very 
simple  sequential  circuits  such  as  shift  registers.  This  section  uses  the  previ¬ 
ously  discussed  abstractions  and  abstraction  techniques  to  develop  examples 
of  temporally  abstract  behaviors  for  more  complex  sequential  circuits.  The 
ideas  being  illustrated  are  simple: 

1.  The  behavior  of  a  group  of  components  appearing  in  a  loop  can  be 
expressed  as  the  composition  of  the  component  behaviors  by  introduc¬ 
ing  a  new  signal  that  represents  the  state  of  the  aggregate  component. 
This  encapsulation  alone  does  not  usually  simplify  reasoning  about  the 
behavior  of  the  loop. 

2.  The  goal  of  abstracting  the  behavior  of  a  sequential  circuit  is  to  collapse 
together  equivalent  states  in  its  state  diagram  —  ideally  down  to  a 
single  state  so  that  the  output  of  the  circuit  can  be  expressed  directly 
in  terms  of  its  inputs  without  the  intervening  “state”  signal. 

3.  If  the  behavior  of  a  sequential  device  involves  performing  computations 
that  are  similar  to  counting,  sampling,  recognizing  sequences,  and  so 
forth,  then  a  powerful  way  to  simplify  its  behavior  (that  is,  reduce 
the  number  of  distinguishable  states)  is  to  describe  its  inputs  in  terms 
of  corresponding  temporal  abstractions  such  as  count,  sample,  and 
sequence. 

5.8.1  The  Reset  Hold  Counter 

The  Reset  Hold  Counter  circuit  (Figure  5.11)  from  the  Console  Controller 
Board  is  a  simple  example  that  illustrates  the  role  of  loop  encapsulation  in 
deriving  temporally  abstract  behaviors.  When  the  Reset  signal  is  asserted 
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and  the  clock  signal  Clock  is  running  at  k  Hz,  the  Run  signal  is  asserted  for 
at  least  seconds. 


Figure  5.11:  Reset  Hold  Counter 
Reset  I  I 

Clock  TJTJTJTJTJTJTJTJTJTJ"LfLrLnJTJTJTJTJ 


Run  - 1  l 


This  circuit,  containing  a  14-bit  counter,  has  at  least  214  distinguishable 
states,  but  by  using  the  temporal  abstractions  it  is  possible  to  describe  its 
behavior  using  only  three  states.  The  intuition  behind  this  is  that  if  the 
Clock  input  is  known  to  be  periodic,  and  it  is  known  how  long  it  has  been 
since  the  counter  has  been  reset,  then  the  state  of  the  counter  (and  hence  of 
the  circuit  as  a  whole)  is  computable  from  the  product  of  the  clock  frequency 
and  the  length  of  time  the  Reset  signal  has  been  1.  The  temporally  abstract 
behavior  rh  is  derived  at  length  in  Appendix  C. 

This  behavior  can  be  described  as  a  three-state  automaton  (Figure  5.12). 
The  automaton  has  one  of  each  of  three  general  kinds  of  transition  conditions: 
(i)  transitions  out  of  certain  states  caused  by  input  events;  (ii)  transitions 
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that  occur  no  matter  what  the  previous  state  was;  (iii)  transitions  arising 
from  being  in  a  given  state  for  a  certain  amount  of  time.  The  interaction 
between  these  three  kinds  of  transitions  shows  up  as  somewhat  complex  per¬ 
sistence  rules.  Complex  as  it  is,  the  encapsulation  of  the  entire  circuit  along 
with  the  frequency  temporal  abstraction  allows  the  resulting  behavior  to  be 
quite  simple  relative  to  the  counter  that  underlies  it. 

Figure  5.12:  Reset  Hold  Counter  Three  State  Automaton 


The  first  transition  rule  says  that  when  the  reset  input  is  1,  the  com¬ 
ponent  goes  into  the  Reset  state.  It  is  not  necessary  to  know  the  previous 
state  nor  the  previous  value  of  the  input,  so  this  rule  is  simpler  than  most 
transition  rules: 

If  [isa  ?r  reset-hold] 
and  [thru  ?11  ?ul  (node  ?r)  normal] 
and  [thru  ?12  ?u2  (11  (in  reset  ?r))  1] 
smd  (<■  (♦  8  (max  ?11  ?12))  (min  ?ul  ?u2)) 

Then  [thru  (+  8  (max  ?11  ?12))  (+  8  (max  ?li  ?12)) 

(state  ?r)  Reset] 
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The  persistence  rule  associated  with  the  Reset  state  says  that  ?R  stays 
there  as  long  as  there  are  no  changes  from  1  to  0  on  the  reset  input: 

If  [isa  ?r  reset-hold] 
and  [thru  ?11  ?ul  (mode  ?r)  normal] 
and  Cthru  ?12  ?u2  (event  1  0  (11  (in  reset  ?r))>  nil] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [thru  ?13  ?u3  (state  ?r)  Reset] 
and  (<-  (max  ?11  ?12)  ?u2  (min  ?ul  ?u2)> 

Then  [thru  (max  ?11  ?12  ?13)  (+  8  (min  ?ul  ?u2>) 

(state  Tr)  Reset] 

From  the  Reset  state,  a  change  from  1  to  0  on  the  reset  input  causes  a 
transition  to  the  Run  state.  Typical  of  most  transition  rules,  the  conclusion 
is  only  warranted  at  the  single  moment  after  the  input  changed: 

If  [isa  ?r  reset-hold] 
and  [thru  ?11  ?ul  (mode  ?r)  normal] 
and  [thru  712  7u2  (event  i  0  (11  (in  reset  Tr)))  t] 
and  (overlap  (711  7ul)  (712  7u2)> 
and  [thru  713  7u3  (state  ?r)  Reset] 
and  (overlap  (711  ?ul)  (712  7u2)  (713  7u3)) 

Then  [thru  (+  8  ?ul)  (+  8  ?ul)  (state  ?r)  Run] 

A  separate  persistence  rule  extends  the  Run  state  until  the  reset  input 
is  asserted,  or  until  213  clock  cycles  have  elapsed.  Given  a  frequency  of  ?f , 
213  clock  cycles  elapse  in  213  x  ^  seconds: 


I 


> 


> 


I 
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If  [is*  Tr  rasat-hold] 
and  [thru  Til  ?ul  (sods  ?r)  normal] 
and  [thru  T12  ?u2  (event  0  1  (11  (in  rasat  ?r)))  nil] 
and  (overlap  (Til  Tul)  (T12  Tu2)> 
and  [thru  T13  ?u3  (fww  ?*  Tcjc  (11  (in  dk  ?r)))  ?f] 
and  (>  ?f  0) 

and  (ovarlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 
and  [thru  T14  Tu4  (a rant  :anj  run  (stata  Tr))] 
and  (ovarlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)  (T14  Tu4)) 
Then  [thru  (max  Til  T12  T13  T14) 

(♦  6  (nin  (♦  (max  Til  T12  T13  T4)  21S  x  £) 

(■in  Tul  Tu2  Tu3))) 

(stata  Tr)  Run] 

A  transition  rule  for  the  Run  state  makes  the  transition  to  the  Stop  state 
happen  when  enough  clock  cycles  have  passed  (that  is,  21S  cycles): 

If  [isa  Tr  rasat-hold] 
and  [thru  Til  Tul  (mod*  Tr)  aomal] 
and  [thru  T12  Tu2  (stata  Tr)  Run] 
and  (overlap  (Til  Tul)  (T12  Tu2)) 
and  [thru  T13  Tu3  (fww  T*  Tcyc  (11  (in  dk  Tr)))  Tf] 
and  (<-  (*  (/  1  Tf)  2”)  (-  Tu2  T12)) 

Then  [thru  (+  6  Tu2)  (+  6  Tu2)  (stata  Tr)  Stop] 

Finally,  the  Stop  state  persists  so  long  as  no  0  to  1  changes  occur  on  the 
rasat  input: 

If  [isa  Tr  rasat-hold] 
and  [thru  Til  Tul  (node  Tr)  normal] 
and  [thru  T12  Tu2  (event  0  1  (11  (in  rasat  Tr)))  nil] 
and  (ovarlap  (Til  Tul)  (T12  Tu2)) 
and  [thru  T13  Tu3  (state  Tr)  Stop] 
and  (ovarlap  (Til  Tul)  (T12  Tu2)  (T13  Tu3)) 

Then  [thru  (nar  Til  T12  T13)  (♦  S  (nin  Tul  Tu2)) 

(state  Tr)  Stop] 

The  behavior  of  the  Reset  Hold  Counter  can  thus  be  expressed  compactly 
by  representing  the  state  of  the  counter  implicitly  with  the  frequency  and 
duration  temporal  abstractions. 
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5.8.2  The  Audio  Counter 

The  Audio  Counter  (Figure  5.13)  bean  obvious  similarities  to  the  Reset  Hold 
Counter  discussed  above,  but  it  has  subtle  differences  that  lead  to  a  differ¬ 
ent  temporally  abstract  behavior.  The  relevant  temporal  abstraction  is  the 
samp  abstraction  encountered  earlier.  Using  this  abstraction  the  temporally 
abstract  behavior  of  the  encapsulated  Audio  Counter  will  resemble  that  of  a 
frequency  divider. 

Figure  5.13:  Audio  Counter 
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While  the  Reset  input  of  the  Reset  Hold  Counter  starts  the  counter  back 
at  0  whenever  asserted,  in  the  Audio  Counter  only  the  fint  l-to-0  transition 
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of  the  Start  signal  matters.  Eighteen  clock  cycles  must  pass  before  the  zero 
state  can  be  reached  again:  while  counting,  it  is  insensitive  to  the  Start 
signal. 

Some  temporal  abstractions  that  applied  to  earlier  examples  can  be  ap¬ 
plied  to  the  behavior  of  this  circuit;  however,  the  assumptions  on  which  they 
depend  are  violated  by  the  normal  usage  of  the  circuit  and  so  the  result¬ 
ing  temporally  abstract  behaviors  have  little  predictive  force.  For  example, 
while  the  signal  Nab  is  a  constant  1,  the  Audio  Counter  forms  a  frequency 
divider  with  respect  to  the  Clock  input;  however,  the  clocks  come  in  bursts 
of  eighteen  and  normally  the  Start  line  goes  low  at  least  once  per  burst  — 
the  “frequencies”  are  thus  defined  over  so  few  cycles  as  to  be  useless.  For 
another  example,  the  “counting”  behavior  of  the  Audio  Counter  can  be  cap¬ 
tured  by  the  product  of  a  frequency  and  a  duration,  but  only  during  the 
bursts  of  eighteen  clock  cycles  and  hence  this  is  similarly  useless. 

Appendix  D  shows  the  derivation  of  a  behavior  that  is  event-preserving: 
n  falling  edges  on  the  Start  signal  sampled  with  respect  to  falling  edges  of 
the  Clock  will  result  in  somewhere  between  [jfJ  and  n  falling  edges  on  Mab. 
Thus  the  number  of  falls  on  Mab  (measured  with  respect  to  rising  edges  of 
Clock)  is  bounded  as  follows: 

( (count-** 

n  (fall  (samp  (riaa  Clock)  Start)))  tiaM)  > 

( (count-** 

n  (fall  (sang)  (riaa  Clock)  Mab)))  tia»)  > 

(floor 

( (count-** 

n  (fall  (sasq?  (rise  Clock)  Start)))  tima) 

18) 

A  similar  inequality  was  derived  earlier  for  the  shift  register,  with  the 
consequence  that  a  relationship  could  be  defined  between  the  frequency  at 
the  input  and  at  the  output.  A  similar  derivation  for  the  eighteen-counter 
results  in  a  similar  relationship.  If  the  incoming  signal  of  the  counter  has 
a  large  enough  frequency  over  a  large  enough  interval,  the  output  signals 
will  have  positive  frequencies  as  well;  the  relationship  below  makes  “large 
enough”  precise: 
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•  If  (fww  10  *  (nil  t)  (change  V  S)>  ii  always  >  i  from  time  /  to  u, 
and  18  <  then  from  time  /  +  18ti>  to  u,  ( count -ww  (*  18  to) 

(change  (eighteen-counter  V  S)))  is  always  >  0. 

The  relevant  behavior  rule  thus  looks  very  similar  to  the  rule  for  the 
accumulator;  the  difference  is  that  the  conditions  under  which  it  can  be 
deduced  that  a  changing  input  will  result  in  a  changing  output  are  more 
restricted  than  for  the  accumulator: 

If  [isa  ?ceb  clocked-serial-burst-detector] 
and  [thru  ?11  ?ul  (node  ?csb)  normal] 
and  [thru  ?12  ?u2 

(few  ?«  '(nil  t)  (cross  0  (cs  (in  a  ?csb))))  ?f] 
and  (<  (/  1  ?w)  ?f) 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  (<  18  (/  (-  (min  ?ul  ?u2)  (max  ?11  ?12))  ?w)) 
and  [thru  ?a  ?z  GR  t] 

and  (<■  (+  (swz  ?11  ?12)  (*  18  ?w))  ?a  ?z  (min  ?ul  ?u2)) 
Then  [thru  ?z  ?z  (changing-wrt  ?a  ?z  (11  (out  y  ?csb)))  t] 

This  temporally  abstract  behavior  rule  is  useful  because  it  uses  simple 
observations  about  signals  to  yield  other  easily  observed  predictions. 

5.8.3  Microprocessors 

The  behavior  of  a  microprocessor  can  in  principle  be  represented  as  an  enor¬ 
mous  finite  state  automaton.  However,  its  behavior  can  be  represented  in 
a  temporally  abstract  way  by  characterizing  its  behavior  in  just  two  states: 
Stop  and  Run.  The  Console  Controller  Board  contains  two  eight-bit  mi¬ 
croprocessors,  an  Intel  8035  and  an  Intel  8741.  These  microprocessors  run 
instructions  only  when  their  incoming  clocks  are  valid  two-phase  clocks  of  no 
more  than  5  Mhz.  The  abstraction  two-phase- clock  maps  a  pair  of  {l,  0} 
signals  to  {t,  nil},  where  t  marks  the  end  of  a  two-phase  clock  cycle.  To  be 
in  the  Run  state  the  processors  must  have  their  reset  input  unasserted  and 
the  incoming  clock  signal  be  a  valid  two-phase  clock  with  frequency  less  than 
5  Mhz: 

((fww  n  ’ (nil  t) 

(two-phase-clock  C5MhzH  C5MhzL>)  tine) 


0  < 


<  5  *  10® 
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When  not  running  they  are  in  the  Stop  state  and  their  outputs  are  idle. 
In  the  Run  state  some  of  their  outputs  are  periodic.  For  example,  In  the  Run 
state  of  the  18035  with  a  5  Mhz  dock,  the  PSEN  output  runs  at  50  Khz  and 
the  ALE  output  at  300  Khz;  these  frequencies  are  asserted  by  a  rule  using 
window  sizes  corresponding  to  a  thousand  cycles  of  each  signal: 

If  [isa  Ti  18035] 
and  [thru  ?1  ?u  (state  ?i)  Run] 

Then  [thru  ?1  ?u  (fvw  1000  x  (1  300Khz) 

* (0  1)  (11  (out  psan  ?i)))  300Khz] 
and  [thru  ?1  ?u  (f  ww  1000  x  (1  -j-  50Khz) 

*(0  1)  (11  (out  ale  ?i)>)  SOKhz] 

Similarly,  in  its  Run  state  the  ^741  provides  clocks  and  initialization 
signals  to  the  keyboard  and  keypad  at  frequencies  in  the  neighborhood  of 
20  Khz. 

5.8.4  Abstract  Buffers 

The  behavior  of  a  single-input  single-output  buffer  is  the  identity  function:  its 
output  at  each  moment  is  the  same  as  its  input.  Since  the  identity  function 
is  one-to-one,  buffers  are  event  preserving  and  lend  themselves  to  temporally 
abstract  behavior  descriptions.  There  are  only  a  few  buffers  per  at  in  a  typical 
digital  circuit.  On  the  other  hand,  digital  circuits  often  have  substantial 
amounts  of  circuitry  devoted  to  doing  information- preserving  transformations 
of  data  from  one  encoding  to  another  —  from  serial  to  parallel,  for  example. 
At  the  right  level  of  temporal  abstraction,  modulo  the  nuances  of  the  different 
data  formats,  many  seemingly  complex  circuits  axe  really  "buffers”  in  this 
broader  sense.  Buffers  thus  appear  in  the  Console  Controller  Board  at  various 
levels  of  temporal  abstraction.  In  the  Audio  Counter  the  Manchester-to-serial 
converter,  for  example,  is  just  a  buffer  when  viewed  with  respect  to  incoming 
(Manchester  encoded)  and  outgoing  (encoded  serially  with  a  clock)  signals: 

If  [isa  ?■  manchester-to-serial] 
and  [thru  ?1  ?u  (mode  ?m)  normal] 

Then  [teams  ?1  ?u  (mancheater  (in  a  ?m))  (cs  (out  y  ?■))] 
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Similarly,  the  serial-to-parallel  converter  can  be  viewed  as  a  buffer  be¬ 
tween  byte  streams  encoded  synchronously  and  serially  (ca)  and  in  parallel 
(cp): 


If  [isa  ?s  serial-to-parallel] 
and  [thru  ?1  ?u  (node  ?s)  no mal] 

Then  [taane  ?1  ?u  (ca  (in  data  ?a>)  (cp  (out  y  ?a))] 

The  behavior  rules  for  buffer-like  behaviors  all  deduce  a  team*  relation 
between  their  inputs  and  outputs. 

5.8.5  Programmed  Microprocessors 

Encapsulation  and  temporal  abstraction  can  be  applied  to  circuits  containing 
microprocessors.  In  doing  so,  the  resulting  behaviors  collapse  large  state 
transition  diagrams  into  tiny  ones  and  sacrifice  a  great  deal  of  precision. 
They  are  useful  for  troubleshooting  because  they  allow  predictions  to  be 
made  efficiently  about  temporally  coarse  features  of  signals.  The  behavior 
model  for  the  8741  processor  on  the  Console  Controller  Board,  for  example, 
predicts  little  more  than  that  if  the  processor  is  running,  rolling  the  mouse 
around  will  cause  it  to  assert  one  of  its  outputs  several  hundred  times  a 
second.  Although  very  coarse,  it  is  useful  because  (i)  it  is  easy  to  distinguish 
between  that  output  being  idle  and  being  very  active  (ii)  a  significant  fraction 
of  faults  in  the  processor  would  cause  that  output  to  be  idle,  and  (iii)  it  is 
more  efficient  than  reasoning  about  hundreds  of  identical  events  individually. 
The  examples  will  be  presented  by  encapsulating  the  component  behaviors  in 
bottom-up  fashion,  eventually  constructing  a  behavior  for  a  group  of  several 
chips  including  two  microprocessors. 

(The  material  in  the  remainder  of  this  section  involves  many  details  spe¬ 
cific  to  the  Console  Controller  Board;  readers  pressed  for  time  may  wish  to 
skip  forward  to  Page  163.) 

The  temporally  abstract  behavior  of  U,  the  Input  Processor,  was  used  in 
the  Input  Encoder  troubleshooting  examples  (Figure  3.10  on  Page  58  shows 
the  functional  organization  of  the  Input  Encoder).  U  consists  of  the  Intel  8741 
microprocessor  mentioned  above,  along  with  the  onboard  PROM  that  stores 
its  control  program.  Most  of  the  behavior  of  U  is  simple  enough  to  represent 
usefully  with  temporal  abstractions.  With  the  right  temporal  abstractions 
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and  assumptions  about  its  incoming  signals,  its  behavior  can  be  expressed 
as  a  combinational  function  of  its  inputs.  The  essential  abstractions  making 
this  possible  are  as  follows: 

•  For  troubleshooting  purposes,  the  most  important  properties  of  the 
incoming  keyboard  and  mouse  data  signals  can  be  concisely  expressed 
in  terms  of  changes  and  rates  of  change. 

•  Although  U  sends  all  its  output  packets  over  a  common  eight-bit  bus, 
the  rate  at  which  different  types  of  packets  are  sent  is  substantially  dif¬ 
ferent,  and  this  can  be  taken  advantage  of  in  representing  its  temporally 
abstract  behavior. 

The  Keyboard  and  Keypad  inputs  are  encoded  serially  and  synchronously, 
while  the  behavior  of  the  Console  Controller  Board  refers  to  changes  in  the 
state  of  individual  keys.  Temporal  abstractions  are  needed  to  map  from 
the  low  level  encoding  up  to  the  level  of  changes  in  key  positions.  The 
Keyboard  signal  is  taken  as  the  specific  example;  the  Keypad  signal  is  treated 
analogously. 

The  full  state  of  the  keyboard  —  the  position  of  every  key  —  is  trans¬ 
mitted  repeatedly  to  the  Input  Processor  approximately  a  thousand  times  a 
second.  There  are  three  digital  signals  that  accomplish  this,  kbd-reset,  kbd- 
clock,  and  kbd-data.  Figure  5.14  shows  an  example:  the  kbd-reset  signal  is 
asserted  to  indicate  that  a  new  scan  of  the  keyboard  is  beginning,  kbd-dock 
has  one  rising  edge  for  each  of  88  keys,  and  kbd-data  is  0  wherever  the  cor¬ 
responding  key  is  pressed,  in  this  case  the  key  in  the  third  position  on  the 
keyboard.  While  all  the  keys  are  up  the  signal  kbd-data  is  a  constant  1. 

Figure  5.14:  The  Third  Key  is  Pressed 
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The  temporally  abstracted  signal  kbd-state  represents  the  accumulated 
bits  in  each  previous  sequence  of  88  clock  cycles.  The  remaining  abstraction 
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needed  is  to  represent  the  signal  in  terms  of  changes  of  the  state  of  the 
keyboard,  that  is,  of  changes  in  the  position  of  the  keys.  The  signal  kbd- 
events  represents  that  abstraction. 

kbd-events  fl  nil . . .  (Abort  Down)  nil 

kbd-state  §  0  . . .  512 _ 0 _ 

tine  |  0...  1000  1001  . . . 

These  abstractions  map  from  underlying  serial  signals  up  to  a  vocabulary 
of  events  on  individual  keys.  No  further  temporal  abstractions  are  needed 
for  representing  keys,  since  the  rate  at  which  keys  can  change  is  low  enough 
to  be  easily  observable. 

Like  the  keyboard  inputs,  the  inputs  from  the  mouse  are  encoded  in  a  way 
that  is  too  low-level  to  be  useful  for  troubleshooting;  all  that  really  needs  to 
be  represented  is  whether  the  mouse  is  traveling  in  the  x  and  y  dimensions. 
Again,  temporal  abstractions  can  map  from  the  level  of  implementation  up 
to  rates  of  travel.  The  movement  of  the  mouse  along  the  x  axis  is  represented 
using  a  2-bit  gray  code  on  the  (misleadingly  named)  signals  mouse-left  and 
mouse-right.  Each  move  by  inch  in  the  positive  direction  results  in  one 
of  the  events  (0  0)  — »  (0  1)  —♦  (1  1)  — »  (1  0)  —*  (0  0);  the  reverse  for 
the  negative  direction.  Hence  the  net  travel  (not  the  net  change  in  position) 
during  an  interval  n  yields  the  number  of  events.  The  temporally  abstract 
signals  mouse-dx  and  mouse-dy  are  defined  with  an  observation  window  size 
of  one  second  (since  the  mouse  travels  at  up  to  10  inches  per  second,  hence 
there  are  1000  events  per  second,  hence  a  one-second  window  is  1000  times 
the  typical  period). 

The  behavior  of  U  can  now  be  expressed  in  terms  of  the  signals  just 
described,  namely,  the  temporally  abstract  clock,  keyboard,  and  mouse  in¬ 
puts,  along  with  the  Reset  input.  While  the  Reset  line  is  asserted  the  out¬ 
puts  of  U  are  inactive.  While  the  clock  is  running  (that  is,  while  the  signal 
(tvo-phase-clock  C5MhzH  C5MhzL)  has  frequency  5  Mhz)  the  18741  waits 
for  events  indicating  mouse  motions  and  keystrokes,  and  when  such  a  event 
occurs  it  asserts  the  interrupt  line  int,  causing  an  interrupt  cycle  and  ulti¬ 
mately  resulting  in  the  transfer  of  a  packet  to  C.  The  behavior  of  U  thus 
merges  the  various  incoming  events  into  a  single  outgoing  stream  of  packets. 
The  output  signal  packets  is  defined  so  that  it  is  nil  everywhere  except  when 
a  packet  is  being  transmitted,  for  example,  *  (Local  Down)  to  represent  the 
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“local”  key  being  pressed,  ’  (House  Right)  to  indicate  that  the  mouse  has 
moved  —  inch  to  the  right,  and  so  forth. 

The  temporal  scale  at  which  mouse  events  and  keyboard  events  occur  and 
their  effects  on  the  behavior  of  the  Input  Encoder  are  substantially  different. 
Mouse  motion,  for  example,  never  changes  the  state  of  the  Input  Encoder, 
while  events  of  the  “local”  key  change  the  behavior  of  Input  Encoder  dra¬ 
matically.  Furthermore,  it  is  rarely  the  case  that  the  mouse  is  rolled  around 
at  the  same  time  as  the  keyboard  is  being  typed  at  —  or  at  least  this  can 
be  guaranteed  while  troubleshooting.  As  a  consequence,  it  is  useful  to  define 
the  behavior  of  U  under  these  different  conditions  at  two  different  temporal 
resolutions: 

1.  While  the  mouse  is  inactive,  packets  essentially  merges  the  keyboard 
and  keypad  events,  with  int  being  asserted  once  per  packet. 

2.  While  the  keyboard  and  keypad  are  inactive,  (tsign  (  count -w  n 
packets))  is  just  the  (qualitative)  sum  of  the  mouse-dx  and  mouse- 
dy  inputs,  with  (tsign  (count- w  n  (fall  int)))  having  the  same 
value. 

Under  conditions  1  and  2,  U  preserves  events  on  the  keyboard  and  mouse 
inputs  respectively;  the  different  rates  at  which  such  events  occur  means  that 
different  temporal  abstractions  are  appropriate  for  representing  the  resulting 
behavior. 

The  Input  Processor  U,  like  the  18741  that  comprises  it,  has  a  Stop 
and  a  Run  state.  The  difference  between  the  Input  Processor  and  the 
18741  is  the  level  of  abstraction  of  their  inputs  and  outputs.  The  inputs 
of  U  are  the  temporally  abstract  keyboard,  keypad,  and  mouse  inputs. 
The  incoming  kbd-state  signal  that  transmits  the  state  of  the  keyboard  ap¬ 
pears  at  (ks  (in  kbd  U)),  and  for  the  keypad  at  (ks  (in  kpd  U)).  The 
keyboard- events  abstraction  applied  to  these  signals  yield,  respectively, 
the  inputs  (kt  (in  kbd  U))  and  (kt  (in  kpd  U)).  The  signals  mouse- 
dx  and  mouse-dy  transmitting  the  direction  of  mouse  motion  appears  at 
(max  (in  nous#  U))  (along  the  x  axis)  and  (may  (in  mouse  U))  (along 
the  y  axis).  The  output  of  U  is  the  interrupt  signal  INT.  The  salient  rules 
governing  the  behavior  of  the  Input  Processor  are  given  below. 

While  in  the  Stop  state  (that  is,  while  the  reset  line  is  asserted),  the 
INT  output  signal  is  a  constant  1: 
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If  [isa  ?i  input-processor] 
and  [thru  ?1  ?u  (stats  ?i)  Stop] 

Then  [thru  ?1  ?u  (11  (out  int  ?i))  1] 

While  the  mouse  inp  its  are  idle,  each  incoming  keyboard  or  keypad  event 
results  in  the  interrupt  line  being  held  low: 

If  [isa  ?i  input-processor] 
and  [thru  ?11  ?ul  (state  ?i)  Run] 
and  [thru  ?12  ?u2  (mm  (in  seuse  ?i>)  0] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [thru  ?13  ?u3  (mmy  (in  mouse  ?i))  0] 
and  (overlap  (?11  ?ul)  (?12  ?u2)  (?13  ?u3)) 
and  [thru  ?14  ?u4  (kt  (in  kbd  ?i>)  ?kbd] 
and  (overlap  (?11  ?ul)  (?12  ?u2)  (?13  ?u3)  (?14  ?u4)) 
and  [thru  ?15  ?u5  (kt  (in  kpd  ?i))  ?kpd] 
and  (overlap  (?11  ?ui)  (?12  ?u2) 

(?13  ?u3)  (?14  ?u4)  (?1S  ?u5)) 

Then  [thru  (max  ?11  ?12  ?13  ?14  ?1B) 

(min  ?ul  ?u2  ?u3  ?u4  ?u5) 

(11  (out  int  ?i)) 

(if  (or  ?kbd  ?kpd)  1  0)] 

As  long  as  no  keyboard  or  keypad  events  occur,  changes  on  the  interrupt 
line  occur  only  when  there  is  motion  on  the  mouse  inputs: 
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If  [isa  ?i  input-processor] 
and  [thru  ?11  ?ul  (state  ?i)  Run] 
and  [thru  ?12  ?u2  (kt  (in  kbd  ?i))  nil] 

■  and  (overlap  (?11  ?ui)  (?12  ?u2)) 

I  and  [thru  ?13  ?u3  (kt  (in  kpd  ?i))  nil] 

and  (overlap  (?11  ?ul)  (?12  ?u2)  (?13  ?u3)) 

and  [thru  ?14  ?u4  (nmx  (in  nouse  ?i))  ?mmx] 

and  (overlap  (?li  ?ul)  (?12  ?u2)  (?13  ?u3)  (?14  ?u4)) 

and  [thru  ?15  ?u5  (mmy  (in  mouse  ?i))  ?mmy] 

'  and  (overlap  (?11  ?ui)  (?12  ?u2)  (?13  ?u3) 

(?14  ?u4)  (?1B  ?u5)) 
and  [thru  ?a  ?z  GR  t] 
and  (<«  (max  ?11  ?12  ?13  ?14  ?1B)  ?a 
?z  (min  ?ul  ?u2  ?u3  ?u4  ?uB>) 

Then  [thru  (max  ?11  ?12  ?13  ?14  ?1B) 

(min  ?ul  ?u2  ?u3  ?u4  ?uS) 

( changing- wrt  ?a  ?*  (11  (out  int  ?i))) 

(eql  ’+  (qplus  ?mmx  ?sny))] 

Finally,  the  Input  Processor  has  an  antibehavior  rule  that  infers  that  the 
Reset  line  must  be  0  if  there  was  a  keyboard  event  but  the  interrupt  line  was 
never  asserted.  This  is  a  compression  of  a  more  complex  line  of  reasoning 
that  would  infer  that  it  must  have  been  in  the  Stop  state: 

If  [isa  ?i  input -processor] 
and  [thru  ?11  ?ul  (mods  ?i)  normal] 
and  [thru  ?12  ?u2  (11  (out  int  ?i))  1] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [thru  ?13  ?u3  (kt  (in  kbd  ?i)>  ?event] 
and  (overlap  (?11  ?ul)  (?12  ?u2)  (?13  ?u3)) 
and  (not  (null  ?event)> 

Then  [thru  (max  ?11  ?12  ?13)  (aiin  ?ui  ?u2  ?u3) 

(11  (in  reset  ?i))  0] 

As  noted  at  the  beginning  of  this  subsection,  the  temporally  abstract  be¬ 
havior  of  U  is  a  combinational  function  of  its  inputs.  This  was  made  possible 
by  temporal  abstractions  that  (i)  represented  the  incoming  clocks  in  terms  of 
their  frequency  and  relative  phase  (ii)  represented  the  other  inputs  in  terms 
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of  their  events  and  rate  of  events,  and  (iii)  matched  the  rate  at  which  certain 
events  occur.  The  resulting  behavior  exposes  the  simple,  important,  event¬ 
preserving  relationship  between  keystrokes,  mouse  motions,  and  activity  on 
the  interrupt  signal  int. 

The  component  C  treated  as  a  “black  box”  in  the  Input  Encoder  trou¬ 
bleshooting  examples  has  a  similarly  abstract  behavior  description.  C  is 
actually  the  culmination  of  three  intermediate  levels  of  structural  composi¬ 
tion  and  behavioral  abstraction.  This  behavior  will  be  developed  starting  at 
the  lowest  level.  The  first  level  of  composition  contains  a  loop  that  involves 
an  Intel  8035  microprocessor,  a  PROM,  and  two  ancillary  chips;  the  result 
of  that  composition  will  be  called  P.  There  are  three  essential  abstraction 
steps: 

1.  The  microprocessor  communicates  via  a  bidirectional  bus,  but  this  com¬ 
plicates  behavior  descriptions;  hence  a  distinction  is  made  between  the 
incoming  and  outgoing  signals  of  the  microprocessor,  sent  along  the 
same  bus  at  different  times. 

2.  At  the  temporal  scale  of  individual  instructions,  each  address  that  the 
microprocessor  presents  to  the  PROM  depends  on  what  the  previously 
returned  instruction  was.  However,  some  of  the  outputs  of  the  micro¬ 
processor  do  not  depend  on  the  instructions  being  executed,  and  this 
fact  can  be  used  to  form  useful  temporal  abstractions  of  the  micropro¬ 
cessor  behavior. 

3.  Temporal  abstractions  can  simplify  the  con  posed  behavior  of  the  mi¬ 
croprocessor  and  PROM  down  to  only  four  states.  This  drastic  simpli¬ 
fication  is  possible  because  most  of  the  time  the  Console  Controller  is 
merely  buffering  incoming  data  from  the  keyboard  and  mouse. 

Figure  5.15  shows  the  microprocessor  18035  and  the  components  used  to 
present  instruction  addresses  to  the  PROM  (the  original  circuit  schematic  is 
part  of  Page  60).  The  eight-bit  bidirectional  bus  connected  to  the  micropro¬ 
cessor  ports  AD7-0  has  been  divided  into  an  outgoing  “address”  signal  A7-0 
and  an  incoming  “instruction”  signal  17-0.  The  signals  are  valid  at  different 
times  in  the  basic  instruction  cycle  of  the  18035,  and  the  abstract  signals 
shown  are  the  results  of  a  sampling  abstraction  with  respect  to  the  clocks 
Clkl  and  Clk2. 
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Figure  5.15:  Functional  Organization  of  Console  Controller 


The  structural  composition  of  the  18035,  PROM,  and  ancillary  compo¬ 
nents  forms  the  component  P,  whose  behavior  is  just  that  resulting  from  the 
18035  executing  the  program  stored  in  the  PROM.  The  stored  console  control 
program  implements  an  idle  loop  that  responds  to  interrupts  from  the  Input 
Processor  by  reading  a  packet  and  (usually)  sending  it  on  to  the  host.  Some 
sequences  of  keystroke  packets  are  not  sent  on,  but  are  intercepted  and  cause 
the  program  to  perform  operations  local  to  the  console,  such  as  changing 
the  brightness  of  the  screen.  Portions  of  the  behavior  of  P  can  be  described 
in  a  temporally  abstract  way.  For  example,  the  eight  bits  of  the  main  bus 
over  which  the  addresses  and  instructions  are  transmitted  should  never  be 
flat  for  more  than  a  few  clock  cycles;  similarly,  during  the  idling  loop  of  the 
program  the  RD  and  WR  signals  are  asserted  periodically.  For  these  signals, 
(tsign  (f  ww  n  '  (0  1)  . . . ) )  is  +  while  C  is  running.  Although  P  has  less 
complex  behavior  than  the  microprocessor  —  it  has  fewer  distinct  states  — 
aside  from  these  few  signals  it  still  does  not  lend  itself  to  temporal  abstrac¬ 
tion;  its  interactions  with  U,  for  example,  must  be  reasoned  about  at  the 
level  of  individual  instructions. 

P  communicates  with  several  slave  components  via  a  bidirectional  bus, 
but  since  most  of  these  communications  are  one-way,  it  is  useful  to  represent 
the  paths  between  the  processor  and  each  slave  as  a  separate  signal.  This 
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abstraction  is  represented  as  a  second  level  of  composition  that  forms  the 
component  B.  B  is  a  composition  of  P  along  with  the  addressing  and  tim¬ 
ing  circuitry  that  mediates  these  communications  (Figure  5.16).  The  Audio 
Decoder,  Brightness  and  Loudness  registers,  and  the  Serial  Encoder  are  all 
write-only;  the  mode  switches  and  Input  Processor  are  read-only. 


Figure  5.16:  Components  of  B 
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Each  of  the  input  and  output  signals  of  P  is  a  temporal  slice  of  the 
bidirectional  bus  that  P  communicates  over.  That  is,  it  is  the  result  of 
sampling  the  bus  at  particular  moments.  To  be  specific,  the  value  that  the 
abstract  signal  carries  is  the  value  being  sent  at  the  moment  to  the  given 
destination,  and  is  otherwise  nil.  An  example  in  which  the  value  “20”  is 
being  written  to  the  brightness  register  is  represented  as: 
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The  signals  to-audio,  to-loudness,  and  from-switches  are  eight-bit  integers 
like  to-brightness;  the  signals  from-input-processor  and  to-serial  carry  “pack¬ 
ets,”  to  be  defined  below. 

The  third  and  final  level  of  composition  that  forms  C  is  a  loop  encapsu¬ 
lation  that  combines  B  with  the  mode  switches  that  control  certain  minor 
aspects  of  its  behavior.  The  switches  are  read  from  repeatedly  during  the 
idle  loop  of  P,  hence  this  encapsulation  results  in  some  simplification  of  the 
overall  behavior. 

The  interrupt-response  cycle  that  accomplishes  the  transmission  of  pack¬ 
ets  from  component  U  to  component  C  forms  a  loop  (Figure  5.17).  U  inter¬ 
rupts  C  by  asserting  its  int  signal,  C  responds  by  asserting  RD;  two  eight-bit 
words  forming  a  packet  are  then  transmitted  from  U  to  C  as  the  signal  pack¬ 
ets.  The  combined  behavior  of  these  two  components  is  complex,  and  there 
may  be  hundreds  of  interrupt  cycles  for  a  single  mouse  motion.  Encapsulat¬ 
ing  the  loop  as  component  E  and  using  temporal  abstractions  can  reduce  the 
behavioral  complexity  to  manageable  proportions.  The  temporal  abstrac¬ 
tions  that  apply  to  U  and  C  individually  have  been  discussed  earlier;  here 
only  the  combined  behavior  of  the  two  is  considered.  That  behavior  lends 
itself  to  temporal  abstractions  that  reduce  it  to  only  four  distinct  states  (Fig¬ 
ure  5.18).  The  four-state  diagram  arises  as  a  consequence  of  the  following 
observations  about  U  and  about  the  instruction-level  behavior  of  C: 

1.  The  interrupt-cycle  interaction  between  U  and  C  is  fully  encapsulated 
within  E.  Furthermore,  it  was  shown  that  the  behavior  of  U  is  event¬ 
preserving  and  state-free  under  the  right  temporal  abstractions.  Hence 
the  behavior  of  E  is  mostly  dependent  on  the  state- transition  behavior 
of  C. 

2.  Like  many  state  machines,  C  has  a  “reset”  input  that  puts  it  into  Reset 
state  in  which  it  does  nothing.  However,  it  also  requires  an  initialization 
procedure  of  about  a  hundred  instructions  before  actually  responding 
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Figure  5.17:  Components  U  and  C  Together  form  Component  E 
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to  any  inputs.  This  instruction  sequence  can  be  treated  as  a  separate 
Init  state. 

3.  C  is  fundamentally  interrupt-driven;  after  being  initialized,  most  of 
the  time  it  is  waiting  idly  for  interrupts.  More  important,  after  most 
interrupts  it  returns  to  the  same  state  as  it  had  before.  This  suggests 
that  most  of  its  behavior  can  be  captured  as  a  function  of  the  most 
recent  input  event,  without  any  reference  to  earlier  events.  This  is  its 
behavior  in  the  Monitor  state. 

4.  What  behavior  of  C  cannot  be  captured  as  a  function  of  the  most  recent 
event  is  capturable  in  terms  of  the  counting  and  duration  abstractions. 
Such  behaviors  all  occur  in  the  Local  state. 

The  behavior  of  E  in  each  of  its  four  states  can  now  be  discussed  in  more 
detail. 

While  in  the  Reset  state  all  of  the  outputs  of  E  are  held  constant;  to- 
brightness,  for  example,  is  nil.  E  remains  in  this  state  as  long  as  the  reset 
input  is  asserted  (0). 

The  Init  state  is  entered  when  the  reset  input  becomes  1  and  the  fre¬ 
quency  of  the  two-phase  clock  input  is  greater  than  0  and  less  than  5  Mhz. 
E  remains  in  the  Init  state  for  600  cycles  of  the  twp-phase  clock  input,  since 
there  are  about  a  hundred  instructions  and  each  requires  six  clock  cycles. 
There  are  a  few  output  operations  performed  during  the  Init  state:  the 
eight-bit  brightness  and  loudness  registers  are  set  to  an  average  value,  and 
an  initialization  sequence  of  some  40  bytes  is  sent  to  the  Serial  Encoder. 
Thus,  for  example,  the  output  signal  to-brightness  transmits  the  value  128 
during  the  Init  state,  while  the  to-serial  signal  transmits  the  special  token 
’  init  representing  the  initialization  sequence  of  the  Serial  Encoder. 

In  the  Monitor  state,  E  behaves  very  much  as  U  does:  events  on  the 
incoming  keyboard  and  mouse  inputs  are  converted  to  packets  and  sent  to 
the  output,  in  this  case  the  to-serial  output.  The  sole  exception  is  the 
event  ’  (Local  down) ,  which  is  not  transmitted  but  rather  causes  a  state 
transition  to  the  Local  state. 

The  complex  behavior  of  E  occurs  during  the  Local  state.  Events  on 
the  mouse  are  sent  unchanged  to  to-Serial  as  in  the  Monitor  state,  but  some 
keystrokes  cause  activity  on  the  to-Audio,  to- Brightness,  and  to-Loudness  out¬ 
put  signals.  > 
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The  Q  key  is  used  to  produce  a  tone  on  the  speaker.  While  the  G  key 
is  held  down7  the  to-audio  signal  carries  a  repeating  sequence  of  integers 
forming  a  sinusoidal  signal  of  frequency  1  Khz  and  of  amplitude  128.  For 
troubleshooting,  the  important  properties  of  this  signal  are  crossings  of  its 
midpoint  value,  both  in  its  first  and  second  derivatives  (as  introduced  in 
the  Audio  Decoder  troubleshooting  examples).  The  two  temporally  abstract 
signals  shown  below  both  have  the  value  “1  Khz”  while  G  is  pressed,  and  are 
0  otherwise: 


(fww  lzac  * (nil  t) 

(cross  127  (samp  to-Audio  to-Audio))) 

(fww  lsec  * (nil  t) 

(cross  0  (dt  (samp  to- Audio  to-Audio)))) 

The  B  key  is  used  to  brighten  the  screen  continuously,  from  0  up  to  a 
maximum  brightness  of  255.  While  the  B  key  is  held  down,  the  to-Brightness 
signal  increases  at  a  rate  of  3msec  per  step  until  it  reaches  255.  Conversely, 
the  D  key  dims  it.  Just  as  the  “counting”  behavior  of  the  Reset  Hold  Counter 
could  be  expressed  in  terms  of  the  duration  abstraction,  similarly  the  to- 
brightness  output  can  be  expressed  in  terms  of  the  lengths  of  time  the  B  and 
D  keys  have  been  pressed. 

The  I  and  Q  keys  work  analagously  to  the  B  and  D  keys,  sending  to  the  to- 
loudness  signal  and  making  subsequent  audio  signals  louder  (L)  and  quieter 

(q)- 

The  six  rules  for  the  Reset  Hold  Counter  together  implemented  the  three- 
state  automaton  shown  in  Figure  5.12  (Page  142)  —  not  an  unusual  ratio  of 
rules-to-states,  since  a  typical  state  diagram  will  require  roughly  one  tran¬ 
sition  rule  per  arc  and  one  persistence  rule  per  state.  Writing  the  rules  is 
sufficiently  tedious  that  a  prerequisite  to  managing  a  large  finite-state  dia¬ 
gram  would  be  to  develop  machinery  for  automatically  translating  the  graph 
into  rules.  Because  that  has  yet  not  been  done,  the  temporally  abstract  be¬ 
havior  of  the  component  E,  with  its  four  states  and  eight  arcs,  is  the  largest 
behavior  implemented  to  date.  The  transition  and  persistence  rules  for  E 
and  its  subcomponents  are  sufficiently  similar  to  those  for  the  Reset  Hold 
Counter  that  they  will  not  be  duplicated  here. 

7That  is,  while  (arc*  (kbd-stat*  tin*)  (k«y->pos  ’G))isO. 
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This  completes  the  temporally  abstract  behavior  of  E.  The  important 
point  is  its  simplicity  —  perhaps  not  simple  in  comparison  with  the  simplic¬ 
ity  of  the  behavior  of  a  boolean  gate,  but  vastly  simpler  than  the  behaviors  of 
the  underlying  microprocessors.  The  simplicity  arises  from  the  fact  that  by 
encapsulating  the  complex  interacting  state  machines  within  a  single  compo¬ 
nent  and  expressing  the  inputs  and  outputs  with  temporally  abstract  signals, 
the  result  can  be  expressed  with  far  fewer  states.  Yet  it  retains  a  useful  degree 
of  predictive  power:  for  example,  it  predicts  that  pressing  the  keys  Local  and 
B  will  cause  the  brightness  output  to  increase  -  a  sufficient  prediction  that  if 
the  brightness  does  not  increase,  the  right  suspects  will  be  identified. 
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5.9  Related  Work 

There  are  numerous  formalisms,  languages  and  programs  for  reasoning  about 
time  and  change.  For  the  present  purpose  it  is  sufficient  to  briefly  identify 
four  salient  expressiveness  and  tractability  issues  and  to  point  out  that  TINT 
takes  an  extreme  position,  always  favoring  tractability  over  expressiveness. 

5.9.1  Temporally  Quantified  Statements 

Systems  that  reason  about  time  can  in  part  be  characterized  in  terms  of 
the  kinds  of  facts  that  they  allow  to  be  temporally  quantified.  Some  sys¬ 
tems  admit  only  statements  about  parameter  values,  where  the  parame¬ 
ters  may  be  either  continuous  or  discrete  quantities  [Simmons83]  [Bobrow85] 
[Williams86]  [Kohane87],  Treating  propositions  as  boolean  valued  functions 
(often  called  time  tokens)  allows  any  atomic  proposition  to  be  quantified 
[Dean87]  [Shoham87].  There  have  been  proposals  to  allow  arbitrary  first 
order  sentences  to  be  temporally  quantified  [McDermott82]  [Moszkowski82] 
[Allen84],  but  there  is  no  successful  implementation  of  such  a  language.  TINT 
“signals”  fall  into  the  first  of  these  categories. 

5.9.2  Intervals  and  Constraints  on  Intervals 

Timestamping  facts  so  that  they  hold  at  single  time  points  is  the  most  prim¬ 
itive  form  of  temporal  quantification,  but  this  is  hardly  ever  used  except  in  a 
theoretical  setting  [Hanks86]  [Shoham86].  A  slightly  improved  scheme  is  that 
used  in  TINT  and  TCS  [Russ86],  in  which  statements  hold  over  intervals  with 
fixed  numeric  upper  and  lower  bounds.  Discovering  intersections  between  in¬ 
tervals  is  trivial,  but  the  expressiveness  of  such  schemes  is  quite  limited.  One 
fundamental  difficulty  is  that  systems  with  feedback  can  result  in  runaway 
inference  loops  for  which  each  new  deduction  only  marginally  extends  the 
previous  history.  The  alternative  is  to  allow  algebraic  constraints  of  varying 
sophistication  among  the  intervals  on  sparse  and  dense  sets  of  points.  The 
straightforward  approach  is  to  do  so  with  inequalities  on  the  endpoints  of  the 
intervals  [Valdes86]  [Williams86]  [Kohane87]  [Ladkin87];  a  different  approach 
is  to  use  an  algebra  of  intervals  [Allen83]  [Vilain86]  [Valdes87].  With  either 
approach,  there  is  a  tradeoff  between  expressiveness  and  the  tractability  of 
detecting  interval  overlaps;  the  more  complex  the  constraints  and  the  more 
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complete  the  constraint  propagator,  the  weaker  the  performance  guarantees 
that  can  be  made.  Allen’s  constraint  propagation  scheme  is  a  typical  com¬ 
promise:  the  propagator  is  0(ns)  in  the  number  of  intervals  but  will  not 
detect  all  inconsistent  orderings,  since  the  latter  is  NP-complete  [Vilain86]. 
As  noted  earlier,  TINT  takes  an  extreme  position  in  favor  of  tractability, 
thereby  avoiding  most  of  these  issues.  With  fixed  numeric  bounds,  detecting 
overlap  is  trivial,  and  while  runaway  inferences  cannot  be  prevented  they  are 
at  least  easy  to  detect  using  bounds  on  the  number  of  predications  in  each 
history. 

5.9.3  Persistence 

The  world  has  inertia.  Many  programs  for  maintaining  temporal  assertions 
reflect  this  by  building  in  implicit  persistence  of  facts  over  time.  For  example, 
TMM  [Dean87]  will  autonomously  assume  the  persistence  of  any  fact  in  order 
to  answer  queries.  TINT  and  TCP  [Williams86],  on  the  other  hand,  do  not; 
only  the  application  program  can  add  underived  facts  about  the  duration  of 
intervals.  The  simple  machinery  in  TINT  never  introduces  new  assumptions 
on  its  own,  and  so  as  a  consequence  there  is  an  explicit  justification  for  every 
prediction.  This  is  just  what  is  needed  for  troubleshooting. 

5.9.4  Temporal  Indexing 

Database  organization  obviously  has  an  impact  on  the  kinds  of  queries  that 
will  be  answered  efficiently.  A  recurrent  concern  in  temporal  reasoning  pro¬ 
grams  is  how  a  database  of  temporally  quantified  statements  should  be  in¬ 
dexed.  A  common  approach  is  to  organize  all  the  intervals  referring  to  a 
single  parameter,  token,  proposition,  or  signal  into  a  totally  ordered  list 
[Williams86]  [Dean87].  An  alternative  is  to  organize  the  intervals  into  a 
hierarchy  such  that  all  the  intervals  at  the  leaves  occur  close  together  in 
time,  irrespective  of  the  propositions  they  refer  to  [Kahn77]  [Dean87].  These 
schemes  are  not  incompatible;  in  fact  most  systems  use  a  multiple  indices 
or  a  hybrid  approach.  TINT  does  not  —  it  simply  orders  all  the  intervals 
referring  to  a  given  signal  by  increasing  lower  bounds. 
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5.10  Summary  of  Behavior  Representation 

TINT  is  a  temporal  reasoning  system  that  propagates  assertions  about  time- 
varying  values  at  multiple  levels  of  abstraction.  The  framework  of  signals , 
abstractions ,  and  behaviors  means  that  it  can  be  very  simple  in  its  syntax, 
semantics,  and  computational  machinery.  There  are  three  key  reasons  that 
TINT  can  be  so  simple  and  still  allow  the  representation  and  troubleshooting 
of  complex  circuits. 

First,  there  is  a  rich  vocabulary  of  temporal  abstractions  with  which  to 
describe  behavior.  These  temporal  abstractions  include  such  familiar  con¬ 
cepts  as  change,  cycle,  and  frequency.  Good  abstractions  for  troubleshooting 
preserve  fidelity,  strength,  and  efficiency  by  sacrificing  precision.  Temporal 
abstractions  are  good  for  representing  digital  circuits  for  troubleshooting  be¬ 
cause  they  can  make  the  prediction  task  much  more  efficient,  while  preserving 
fidelity  and  precision  for  those  signal  properties  that  the  troubleshooter  can 
easily  observe  and  that  will  be  disrupted  by  typical  failures. 

Second,  there  are  principles  by  which  temporally  abstract  behavior  defi¬ 
nitions  can  be  built  for  many  circuits.  Temporal  abstractions  result  in  strong 
abstract  behaviors  when  the  underlying  behaviors  ate  event  preserving.  Since 
not  all  components  have  behaviors  that  are  not  event  preserving,  the  tech¬ 
niques  of  reduction  and  synchronization  are  ways  of  taking  subsets  of  behavior 
that  are  event  preserving.  Encapsulating  loops  allows  these  former  abstrac¬ 
tion  techniques  to  be  applied  to  groups  of  connected  components. 

Third,  there  is  an  important  distinction  between  the  definitions  of  the  be¬ 
havior  of  individual  components  and  the  deductions  that  will  be  made  about 
them  during  troubleshooting.  There  are  many  logical  consequences  of  each 
abstraction  and  behavior  definition  that  would  lead  to  useless  deductions 
during  the  prediction  subtask  of  troubleshooting.  TINT  rules  for  each  ab¬ 
straction  and  behavior  are  included  only  when  they  make  deductions  about 
observable  signals  or  when  the  deductions  about  signal  values  that  they  make 
hold  over  significant  stretches  of  time. 
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Chapter  6 

Representing  Faults  and 
Misbehaviors 


The  goal  of  a  troubleshooting  program  is  not  mere  generation  of  candidates, 
but  efficient  discrimination  among  them.  However,  there  are  three  fundamen¬ 
tal  obstacles  to  efficient  discrimination.  First,  the  observations  that  the  trou¬ 
bleshooter  makes  of  the  device  may  be  imprecise.  As  a  consequence  it  may 
be  impossible  to  distinguish  between  some  candidates.  Second,  some  compo¬ 
nent  behaviors  may  be  so  complex  as  to  be  intractable  to  reason  about  in  any 
way  other  than  from  causes  to  effects.  As  a  consequence  the  troubleshoot¬ 
ing  engine  might  not  find  all  the  conflicts  derivable  from  the  observations 
it  has  made  and  hence  inconsistent  candidates  may  survive.  Third,  even  if 
reasoning  from  effects  to  causes  is  possible,  there  may  be  reasoning  impasses 
that  leave  ambiguities  resolvable  only  through  intractable  techniques.  Again, 
the  troubleshooting  engine  may  not  find  all  the  derivable  conflicts,  so  that 
inconsistent  candidates  may  survive. 

In  the  face  of  these  fundamental  difficulties  a  partial  solution  is  to  draw  a 
distinction  between  the  possibility  of  a  candidate  and  its  plausibility  relative 
to  other  candidates.  Instead  of  asking  for  the  logically  possible  candidates,  a 
more  realistic  goal  is  to  ask  for  the  most  likely  candidates  among  those  pos¬ 
sible.  The  program  can  then  terminate  when  any  desired  degree  of  certainty 
is  achieved,  that  is,  after  some  diagnosis  is  significantly  more  likely  than  the 
others.  As  an  additional  benefit,  the  choices  about  which  observations  to 
perform  will  be  more  efficient  because  they  will  be  biased  toward  discrimi¬ 
nating  between  the  most  likely  candidates,  no  matter  what  certainty  is  set  as 
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the  termination  goal.  There  is  always  the  danger  that  estimates  of  relative 
likelihood  will  be  inaccurate.  It  is  possible  that  with  bad  estimates  and  a 
low  threshold  of  certainty  for  termination,  the  program  could  terminate  with 
an  incorrect  diagnosis.  Commitment  to  using  estimates  of  the  likelihood  of 
candidates  implies  a  commitment  to  being  circumspect  about  any  decisions 
the  program  makes  that  are  overly  sensitive  to  those  estimates.  Nevertheless, 
even  giving  candidates  crude  likelihood  estimates  can  provide  a  useful  degree 
of  bias. 

Ranking  candidates  by  their  likelihood  opens  up  new  sources  of  knowledge 
to  take  advantage  of.  An  obvious  source  of  knowledge  concerns  the  relative 
failure  rates  of  the  individual  components  in  the  candidates.  These  are  ul¬ 
timately  grounded  in  accumulated  statistical  data  but  can  also  be  partially 
derived  from  knowledge  about  the  physical  construction  of  the  components. 
Another  source  of  knowledge  is  fault  models  —  knowledge  not  just  about 
how  often  components  fail,  but  also  about  how  they  usually  fail  and  their 
misbehavior  when  they  do.  This  kind  of  knowledge  is  used  in  a  number  of 
model- based  troubleshooting  programs  including  SOPHIE  [Brown82]  and 
IDS  [Pan84]. 

In  typical  uses  of  fault  models,  each  component 'has  a  set  of  misbehaviors 
that  is  assumed  to  be  exhaustive;  candidates  can  be  ruled  out  by  showing 
that  none  of  their  known  misbehaviors  are  consistent  with  observations.  But 
the  crucial  point  is  that  the  program  does  not  need  to  have  an  exhaustive  set 
of  all  the  ways  any  given  component  can  fail  —  it  need  not  know  any  at  all,  in 
fact.  However,  if  knowledge  is  available  about  a  component  misbehavior  that 
can  result  from  some  physical  failures  and  the  proportion  of  failures  in  that 
component  that  would  result  in  that  misbehavior,  then  the  troubleshooting 
engine  can  take  advantage  of  it.  By  knowing  one  or  two  of  the  most  likely 
failure  modes  of  a  component  the  program  can  make  a  better  estimate  of 
the  likelihood  that  it  is  actually  faulty.  For  example,  suppose  that  telephone 
jacks  fail  in  dozens  of  different  ways,  but  that  when  they  fail,  half  of  the  time 
the  effect  (the  misbehavior)  is  as  if  all  of  the  contacts  were  open  circuits,  and 
the  other  half  of  the  time  the  effects  are  different.  This  knowledge  can  be  used 
to  adjust  the  likelihoods  of  candidates  that  hypothesize  the  jack  is  broken. 
If  the  observations  of  the  circuit  indicate  that  it  would  be  inconsistent  for 
all  the  contacts  to  be  open  then  the  jack  is  a  relatively  less  likely,  though 
still  a  possible  candidate.  No  coverage  has  been  sacrificed.  The  program 
has  simply  done  what  a  human  troubleshooter  would  do  —  it  has  brought 
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to  bear  knowledge  about  the  way  things  usually  break  to  focus  on  the  most 
likely  possibilities. 

Fault  models,  then,  can  be  used  as  heuristics  within  a  larger  framework 
of  failure  likelihoods.  Although  this  chapter  is  mainly  about  fault  models, 
the  first  portion  is  spent  presenting  failure  likelihoods  as  a  partial  solution 
to  difficulties  in  discriminating  candidates.  Next,  syndromes  are  presented 
as  a  refinement  of  that  solution.  Syndromes  are  the  concrete  representation 
in  BASIL  and  TINT  for  the  abstract  notion  of  a  fault  model.  They  are 
added  manually  to  the  knowledge  about  a  particular  circuit;  they  are  not 
learned  or  otherwise  automatically  generated.  Next,  several  principles  for 
the  appropriate  use  of  syndromes  in  representing  circuits  will  be  presented, 
along  with  examples  appearing  in  the  Console  Controller  Board.  Finally,  the 
consequences  of  using  knowledge  about  syndromes  in  troubleshooting  will  be 
discussed.  The  mechanics  of  ranking  candidates  by  likelihood  and  for  using 
syndromes  to  adjust  those  rankings  will  be  treated  in  the  next  chapter  along 
with  other  details  of  the  troubleshooting  engine. 


6.1  Failure  Likelihoods 

Estimating  failure  probabilities  in  general  is  subtle  and  complex;  a  very  sim¬ 
ple  framework  is  used  here.  For  example,  independence  between  failures  is 
assumed,  a  strong  simplifying  assumption  (although  not  as  strong  as  assum¬ 
ing  that  failure  effects  are  independent).  This  simple  framework  is  adequate 
because  (as  discussed  later)  the  probabilities  are  used  in  such  a  way  that  the 
overall  performance  of  the  troubleshooting  engine  is  relatively  insensitive  to 
small  variations  in  these  estimates. 

The  status  of  each  BASIL  component  indicates  whether  it  is  believed 
to  be  physically  damaged.  The  status-of  predicate  denotes  this:  when 
[statua-of  U25  working]  is  true  means  that  chip  U25  is  believed  to  be 
undamaged.  The  status  other  means  that  the  component  is  believed  to 
be  damaged  in  some  way.  A  prior  probability  is  assigned  to  the  working 
status  for  each  component,  and  the  probability  of  having  status  other  is 
then  the  difference  between  1  and  the  probability  of  it  working.  As  discussed 
in  Chapter  2,  these  prior  probabilities  influence  the  ranking  of  candidates 
and  probe  suggestions  produced  by  the  troubleshooting  engine:  candidates 
involving  the  components  with  higher  probabilities  of  having  status  othar 
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will  appear  to  be  likelier  candidate*,  and  the  probes  that  the  troubleshooter 
suggests  will  tend  to  be  those  that  discriminate  among  the  likelier  candidates. 

The  probability  of  a  given  component  working  is  estimated  from  its  “com¬ 
plexity”  —  a  nonnegative  integer  representing  the  number  of  breakable  phys¬ 
ical  parts  and  how  likely  they  are  to  break.  Assuming  independence,  the 
probability  of  a  component  having  status  working  is  the  probability  that  all 
its  components  are  working.  The  probability  of  failure  in  a  component  with 
complexity  1  has  been  assigned  .0001  —  any  number  very  close  to  0  could 
have  been  used.  Some  typical  probabilities  for  various  components  are  shown 
below: 


Component 

Etch 

Chiplet 

Pin 

16-pin  Chip 
Oscillator 
Chiplet 
Console 
Controller1 


Complexity  Probability  of  working 
1  .99991  =  .9999 

1  .99991  =  .9999 

2  .9999*  =  .9998 

33  .9999s*  =  .997 

100  .9999100  =  .99 

«  2000  .9999s000  =  .82 


There  are  better  ways  of  estimating  failure  rates;  the  power  dissipation 
of  the  chip,  for  example,  would  probably  be  a  better  predictor.  This  scheme 
has  the  advantage  that  it  can  be  derived  from  the  representation  of  physical 
structure  once  a  basic  unit  of  complexity  has  been 'chosen. 

The  prior  probabilities  assigned  to  each  component  status  influence  the 
candidate  rankings  and  probe  suggestions.  The  likelihood  of  a  candidate  is 
the  normalized  probability  that  all  the  components  in  the  device  have  the 
status  assigned  by  that  candidate.  The  Clock  Generator  provides  a  simple 
example.  Assume  that  etches  and  chiplets  other  than  the  oscillator  have 
complexity  0  so  that  their  probability  of  working  is  1.0;  the  three  components 
and  their  likelihoods  are  then: 


Component 

Kind 

Complexity 

Probability  of  working 

U25 

Oscillator 

100 

p(U25)  =  .9900 

U32 

14  pin  chip 

28 

KU32)  =  .9972 

U30 

16  pin  chip 

32 

p(U30)  =  .9968 

1.0001  is  actually  too  large,  as  can  be  seen  from  this  anomaly.  It  is  used  only  to  simplify 
presentation. 
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Figure  6.1:  Clock  Generator 


1 


Suppose  that  a  discrepancy  is  observed  at  (out  q  u30b),  resulting  in 
the  conflict  (U25,  U30,  U32).  The  candidates  are  the  minimal  covering  sets 
[U25],  [U30],  and  [U32].  The  probability  of  each  of  these  candidates  is  the 
probability  that  the  named  component  is  not  working  and  that  the  others  are. 
A  weight  for  each  candidate  is  then  computed  as  the  probability  normalized 
with  respect  to  all  candidates: 

Diagnosis  Likelihood  Weight 


[U25] 

(1  -  p(U3S))  X  *U30)  X  riusa)  * 

[U30] 

,*U3S)  X  (1  -  p(U30))  X  ,<UM)  = 

.00315 

.20 

[U32] 

rtuas)  x  rfuso)  x  (i  -  j*U33))  = 

.00278 

.17 

As  this  example  shows,  candidates  involving  components  with  relatively 
higher  failure  likelihoods  tend  to  end  up  with  the  largest  weights.  In  this 
case  the  rankings  are  stable  under  perturbations  in  the  component  failure 
likelihoods  so  long  as  their  ordering  is  maintained,  that  is,  so  long  as  the 
physical  complexity  of  U25  is  greater  than  that  of  U30,  and  of  U30  greater 
than  U32. 

The  troubleshooting  engine  can  stop  when  there  is  one  candidate  above 
some  threshold,  which  is  usually  almost  1.  The  relative  proportions  of  the 
failure  likelihoods  among  components  can  influence  the  decision  to  termi¬ 
nate.  In  the  example  above,  if  the  threshold  were  set  to  .90,  the  program 
would  terminate  when  one  of  the  candidates  had  weight  above  .90.  Bad  the 
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physical  complexity  of  U25  been  600  it  would  have  had  90%  of  the  weight 
and  the  program  would  have  stopped,  concluding  that  U25  was  most  likely 
to  be  broken.  Note  that  it  took  more  than  an  order  of  magnitude  difference 
between  complexity  estimates  —  600  being  nineteen  times  as  large  as  the 
complexity  of  U32  —  to  get  this  effect,  however.  Higher  thresholds  require 
bigger  relative  differences  —  for  example,  a  threshold  of  .95  would  have  re¬ 
quired  the  complexity  of  U25  to  be  1100  for  termination  without  further 
observations. 

If  no  candidate  is  above  threshold  these  candidate  weights  are  used  to  help 
decide  where  the  next  probe  should  be  made.  To  a  crude  first  approximation, 
the  choice  of  probe  location  will  be  biased  toward  places  close  to  the  higher 
ranked  candidates.  For  example,  in  the  example  above  (out  0  u25a)  would 
be  chosen  over  (out  y  u32a)  as  long  as  the  complexity  of  U25  was  greater 
than  that  of  U30.  The  details  of  probe  selection  will  be  presented  later;  the 
important  point  for  the  moment  is  that  the  better  the  estimate  of  component 
failure  likelihoods  the  fewer  probes  will  be  needed  on  average  in  the  long  run. 

Using  failure  likelihoods  provides  an  incremental  improvement  in  the  abil¬ 
ity  of  a  troubleshooting  engine  to  distinguish  candidates.  By  presenting  the 
plausible  candidates  in  addition  to  the  possible  ones  and  biasing  the  observa¬ 
tions  made  in  favor  of  the  likely  failures,  the  troubleshooting  engine  should 
be  able  to  provide  the  right  diagnoses  most  of  the  time  using  fewer  probes. 

6.2  Representing  Syndromes 

Fault  models  provide  an  additional  increment  of  power  to  the  troubleshoot¬ 
ing  engine  because  they  can  be  used  to  make  better  estimates  of  candidate 
likelihoods.  Roughly,  this  is  done  by  (i)  splitting  the  weight  assigned  to  a 
given  candidate  into  portions,  one  for  each  way  that  some  component  in  that 
candidate  might  be  misbehaving,  and  (ii)  showing  that  one  or  more  of  those 
portions  corresponds  to  an  inconsistent  diagnosis.  If  (ii)  succeeds  this  means 
that  that  component  was  not  as  likely  to  be  broken  as  was  thought.  That 
candidate  will  be  made  relatively  less  likely,  thereby  indirectly  boosting  the 
other  candidates.  The  details  of  how  the  troubleshooting  engine  performs 
these  steps  will  be  presented  in  the  next  chapter.  The  present  concern  is  the 
representation  of  how  components  misbehave  and  how  likely  they  are  to  do 
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A  syndrome  is  a  set  of  sets  of  physical  failures  that  result  in  equivalent 
misbehaviors  of  a  component.  Since  the  misbehavior  of  a  component  is  rel¬ 
ative  to  its  intended  behavior,  each  syndrome  is  thus  tied  implicitly  to  a 
level  of  behavioral  abstraction.  For  example,  consider  an  imaginary  chip 
inverter-chip  with  four  pins  (power,  ground,  input,  output)  and  just  one 
inverter  on  it.  Some  of  the  following  are  physical  failures  inside  the  chip:  (a) 
the  pulldown  is  open  (b)  the  output  pin  is  open  (c)  the  pullup  is  shorted  (d) 
the  pulldown  is  shorted  (e)  the  input  pin  is  open.  Three  example  syndromes 
are: 

1.  Several  different  combinations  of  physical  failures  would  cause  the  in¬ 
verter  to  produce  a  constant  output  logic-level  of  1.  Its  pulldown  might 
be  open,  its  output  pin  might  be  open  (since  TTL  floats  high),  its  pull¬ 
down  might  be  open  and  its  pullup  shorted,  and  so  on.  This  is  the  set 
of  sets  {{a},  {b>,  {a,c}  ...}. 

2.  Another  set  of  combinations  of  failures  cause  the  inverter  to  produce 
a  constant  logic-level  of  0.  Its  input  pin  might  be  open,  its  pulldown 
might  be  shorted,  and  so  on.  This  is  the  set  of  sets  {{e},  {d},  . . . }. 

3.  Both  sets  of  failures  described  above  cause  the  inverter  to  produce 
a  constant  frequency  of  0.  The  union  of  those  sets  is  thus  another 
syndrome.  This  is  the  set  of  sets  {{a},  {b},  {a,c},  {e},  {d}  . . . }. 
Although  in  principle  syndromes  can  thus  intersect,  in  practice  the 
syndromes  for  a  given  component  are  disjoint  sets. 

Syndromes  are  sets  of  sets  of  failures,  but  for  mnemonic  value  they  are 
usually  named  according  to  the  misbehavior  that  results.  For  example,  syn¬ 
drome  3  above,  which  caused  the  inverter-chip  output  frequency  to  be  zero, 
will  be  denoted  zsrof . 

The  status-of  predicate  is  used  to  indicate  the  belief  that  a  given  com¬ 
ponent  has  a  particular  syndrome.  Thus  [status-of  i  zsrof]  says  that 
component  i  has  some  physical  failure  among  the  set  causing  it  to  output  a 
constant  frequency  of  zero.  The  status  vorking  corresponds  to  an  empty  set 
of  failures;  the  predication  [status-of  i  vorking]  says  that  the  physical 
component  i  has  no  failures  and  is  working  perfectly. 

An  estimated  likelihood  is  assigned  to  each  of  the  possible  statuses 
of  a  physical  component,  using  the  complexity  estimates  introduced  ear¬ 
lier.  For  example,  assume  that  pins  have  complexity  2  and  everything  else 
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has  complexity  0.  Then  the  likelihood  that  the  inverter-chip  is  working 
is  estimated  as  .9999*  —  the  likelihood  that  all  four  pins  are  working. 
The  likelihood  that  the  inverter-chip  has  syndrome  zerof  is  estimated  as 
4  x  ((1  —  .9999*)  x  .9999®)  —  the  likelihood  that  exactly  one  of  the  four 
pins  is  independently  broken.  This  is  only  an  estimate,  since  on  the  one 
hand  there  might  be  failures  in  the  pins  other  than  opens,  but  on  the  other 
hand  multiple  pin  failures  that  would  cause  the  same  syndrome  are  not  being 
counted.  Finally,  the  likelihood  that  it  has  status  other  is  then  1  minus  the 
likelihoods  of  these  other  two  statuses: 

Inverter-Chip 

Syndrome _ Likelihood _ 

working  .9999*  =  .9992 

zerof  4  x  ((1  -  .9999*)  x  .9999*)  =  .0007 

other  1  -  .9992  -  .0007  =  .0001 

The  troubleshooting  engine  can  use  this  information  to  try  to  reduce 
the  likelihood  of  candidates  involving  inverter-chip  components.  Suppose 
there  were  a  candidate  corresponding  to  a  particular  inverter-chip  i  being 
broken.  This  candidate  and  its  weight  would  be  split  into  two  portions  — 
one  corresponding  to  the  hypothesis  that  i  had  status  zwrof,  the  other  to 
the  hypotheses  that  i  had  status  other.  Suppose  its  weight  had  been  .40.  To 
a  first  approximation  the  weight  would  be  split  proportionately  among  these 
two  according  to  their  relative  likelihoods  .0007  and  .0001,  in  this  case  .35 
and  .05.  Now,  if  observations  indicate  that  i  cannot  have  status  zerof  the 
weight  of  that  portion  (.35)  would  get  redistributed  among  all  candidates. 
For  example,  suppose  there  had  been  two  other  candidates  each  with  weight 
.30;  after  redistributing  the  weight  .35  evenly  across  the  three  candidates, 
two  would  have  weights  of  .42  each  and  the  candidate  involving  i  would 
have  weight  only  .17.  Thus  the  likelihood  of  i  being  broken  relative  to  the 
other  candidates  will  have  been  decreased  from  .40  to  .17.  The  details  of  how 
this  is  done  are  presented  in  the  next  chapter. 

To  gain  anything  from  a  syndrome  the  behavior  model  must  be  able  to 
detect  that  it  is  inconsistent  with  observations  that  the  troubleshooter  has 
made.  Thus  each  component  status  has  consequences  in  the  behavior  model. 
Recall  that  if  a  physical  component  has  the  status  working,  has  power,  and  so 
on,  then  its  mode  is  normal.  In  the  case  of  the  invert er- chip,  for  example: 
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If  [isa  ?x  inverter-chip] 
and  [status -of  Tx  working] 
and  [thru  ?1  ?u  (powar  (in  powar  ?x))  t] 

Then  (thru  ?1  ?u  (mode  ?x)  normal] 

Having  a  status  of  zerof ,  however,  implies  a  mode  of  inactive  no  matter 
whether  the  component  has  power  or  not: 

If  [isa  ?x  invert ar- chip] 
and  [status-of  ?x  zerof] 

Then  [thru  -oo  +oo  (mode  ?x)  inactive] 

In  the  inactive  mode  the  output  frequency  is  zero: 

If  [isa  ?x  inverter- chip] 
and  [thru  ?1  ?u  (mode  ?x)  inactive] 
and  Signal  (fww  ?w  ?c  (11  (out  y  ?x))>  exists 
Then  [thru  ?1  ?u  (fww  ?*  ?c  (11  (put  y  ?x)))  0] 

The  indirection  from  the  status  of  “zerof*  to  the  mode  of  “inactive” 
makes  writing  behavior  rules  more  convenient.  For  one  thing,  the  status 
of  a  component  has  no  temporal  bounds,  but  the  mode  signal  does.  For 
another  thing,  only  physical  components  are  given  failure  syndromes,  while 
only  functional  components  have  behaviors.  Finally,  there  are  other  ways  of 
being  in  inactive  mode,  such  as  losing  power: 

If  [isa  ?x  inverter-chip] 
and  [status-of  ?x  ?anything] 
and  [thru  ?1  ?u  (power  ?x)  nil] 

Then  [thru  ?1  ?u  (mode  ?x)  inactive] 

The  following  section  will  clarify  this  by  giving  examples  of  several  syn¬ 
dromes  and  their  associated  misbehaviors. 
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6.3  Principles  for  Using  Syndromes 

There  are  two  situations  in  which  it  is  advantageous  to  represent  syndromes 
and  misbehaviors  explicitly:  (i)  when  there  are  functional  components  that 
have  faults  with  unusually  high  likelihoods,  or  (ii)  when  the  resulting  misbe¬ 
havior  is  drastically  simpler  than  the  correct  behavior. 

Faults  with  high  likelihood  are  worth  including  explicitly.  It  is  useful  to 
know  about  very  likely  failures  because  if  a  particular  component  is  one  of 
many  suspected  of  failure,  but  (say)  99%  of  the  failures  in  components  of 
that  type  produce  a  behavior  other  than  the  one  being  observed,  then  that 
component  is  almost  certainly  not  the  culprit. 

One  of  the  most  common  failures  in  the  field  occurring  in  digital  circuits  is 
the  disconnection  of  a  bonding  wire.  In  BASIL,  bonding  wires  are  considered 
part  of  pins.  The  effect  of  breaks  in  them  is  to  make  the  pin  act  as  an 
open  circuit.  Thus  one  of  the  syndromes  for  pins  is  termed  open,  and  its 
behavioral  impact  is  to  make  the  currents  into  both  ends  of  the  pin  be  0 
(the  signal  (qci  ?port)  denotes  the  sign  of  the  current  into  ?port  and  is 
discussed  in  Appendix  E): 

If  [conn  ?pin  (hole  ?i  ?•)  ?port] 
and  [status-of  ?pin  open] 

Then  [thru  -oo  +oo  (qci  (hole  ?i  ?e) )  0] 
and  [thru  -oo  +oo  (qci  ?port)  0] 

For  example,  if  the  externally  visible  node  of  this  pin  is  connected  to  a 
pullup  and  should  be  pulled  down  via  this  pin,  and  the  node  is  at  logic  level 
0,  then  the  pin  is  probably  not  faulty.  This  is  because  if  the  pin  were  open, 
the  node  would  be  pulled  up  to  1. 

The  likelihood  of  a  pin  working  was  earlier  set  to  .9999*  =  .9998;  the 
likelihood  of  it  having  status  open  is  set  to  .0002.  This  makes  the  other 
status  have  likelihood  0: 


Pin  Status  Likelihood 
vorking  0.9998 
open  0.0002 
other  0.0 


I 
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Thus  the  pin  is  an  extreme  example  of  a  component  with  a  “likely”  syn¬ 
drome  —  it  accounts  for  100%  of  the  failures  in  pins.  It  is  exceptional  in 
that  respect,  however;  no  other  component  has  such  a  syndrome.  The  point 
stands,  however,  that  it  is  useful  to  know  about  just  because  it  is  so  likely. 

Fault*  that  drastically  simplify  behavior  are  worth  including  explicitly. 
One  kind  of  “drastic  simplification”  of  behavior  is  when  the  faulty  component 
produces  a  constant  output  for  all  time,  instead  of  responding  to  changes  on 
its  inputs. 

For  example,  a  common  failure  is  that  crystal  oscillators  crack  or  become 
loose  in  their  casings;  the  result  is  that  the  output  does  not  oscillate,  but 
instead  stays  constant: 

If  [isa  ?o  oscillator] 
and  [thru  ?1  ?u  (mods  ?o)  inactive] 
and  Signal  (fww  ?w  ?c  (11  (out  0  ?o)))  exists 
Then  [thru  ?1  ?u  ( tws  ?w  ?c  (11  (out  0  ?o)>)  0] 

Thus  for  example,  if  the  output  of  the  oscillator  is  active  it  is  probably  not 
faulty.  The  syndromes  and  their  likelihoods  are  based  on  the  presumptions 
that  oscillators  fail  about  50  times  as  often  as  pins,  and  that  there  is  a  nonzero 
likelihood  that  the  oscillator  may  fail  in  other  ways: 

Oscillator  Status  Likelihood  Description _ 

working  0.99  =  .999910'1 

open  0.0099  =  100  x  ((1  -  .9999)  x  .9999“) 

other  0.0001 

The  syndrome  is  useful  because  the  misbehavior  that  results  is  simple  and 
sufficiently  different  from  what  is  expected  that  it  does  not  require  much  ad¬ 
ditional  reasoning  to  detect  whether  it  is  consistent  with  observations  or  not. 
Had  the  syndrome  been  that  the  oscillator  (say)  skipped  every  hundredth 
cycle,  a  detailed  model  of  behavior  would  have  been  required  to  represent 
it,  and  the  available  observations  would  not  have  been  able  to  distinguish  it 
anyway.  Such  misbehaviors  are  usually  better  dealt  with  at  the  lower  levels 
of  physical  and  behavioral  detail  from  which  they  originated. 

Useful  syndromes  have  both  of  these  properties  —  common  and  simplify¬ 
ing.  In  the  case  of  the  pin  and  oscillator  these  properties  are  achieved  because 
of  the  physical  simplicity  of  the  components.  These  properties  can  also  be 


6.3.  PRINCIPLES  FOR  USING  SYNDROMES 


177 


achieved  in  functional  components  with  more  internal  structure  and  complex 
behavior.  Syndromes  can  have  high  likelihood  if  many  internal  faults  produce 
the  same  overall  misbehavior.  Faults  can  cause  the  behavior  to  be  drastically 
simplified  if  they  dominate  all  the  outputs  of  the  component,  or  if  they  lie  on 
internal  sequential  feedback  paths  so  that  the  effects  of  local  misbehaviors 
aggregate  and  cascade.  Thus,  if  there  are  several  faults  that  cause  the  same 
misbehavior,  and  the  misbehavior  is  simpler  than  the  normal  behavior  —  by 
having  fewer  reachable  states,  for  example  —  then  those  faults  constitute  a 
useful  syndrome. 

Consider  for  example  the  burst  detector  in  the  Audio  Decoder  (Fig¬ 
ure  6.2).  Eighteen  clock  cycles  after  the  start  signal  falls,  the  output  Hsb  is 
asserted  for  one  cycle. 

Figure  6.2:  Audio  Counter 


The  internal  structure  of  the  burst  detector  involves  three  chips  —  two 
four-bit  counters  U10  and  Ull,  and  a  quad  NOR  gate  chip  U20.  Any  of  the 
three  chips  U 10,  Ull,  or  U 20  could  fail  in  ways  that  prevent  the  burst  detector 


178  CHAPTER  6.  REPRESENTING  FAULTS  AND  MISBEHAVIORS 


from  ever  starting  to  count,  so  that  Ksb  would  always  be  0.  For  example, 
there  are  three  pins  in  U20  that  if  open  would  cause  the  Load  signal  to  be 
stuck  at  1,  the  result  being  that  counting  would  never  start3.  Thus  each  of 
the  three  chips  has  a  syndrome  denoted  csb-inactive,  and  if  any  of  them 
have  that  status  then  the  burst  detector  is  inactive: 

If  [status-of  ?u  csb-inactiva] 
and  (aaabar  ?u  ' (ulO  uli  u20>) 

Then  [status-of  csbOl  inactive] 

If  the  burst  detector  is  in  inactive  mode  then  both  its  outputs  are  0: 

If  [isa  ?csb  clocked- serial-burst-detector] 
and  [thru  ?1  ?u  (nude  ?ceb)  inactive] 

Then  [thru  ?1  ?u  (11  (out  wr  ?csb))  0] 
and  [thru  ?1  ?u  (11  (out  elk  ?csb))  0] 

For  each  of  the  three  chips,  the  likelihood  of  each  syndrome  occurring  is 
estimated  from  the  likelihood  of  failures  in  the  pins.  For  example,  the  likeli¬ 
hood  of  UlO  working  is  .9999”,  the  likelihood  that  all  16  pins  are  working. 
The  likelihood  of  UlO  having  syndrome  csb-inactiva  is  3  x  (.0002  x  .999930), 
the  likelihood  that  the  chip  has  exactly  one  of  the  three  single-pin  faults  that 
cause  csb-inactiva.  The  likelihood  of  other  is  just  the  residual: 

UlO  Status  Likelihood  Description  _ 

working  0.997  All  16  pins  working 

csb-inactiva  0.0006  Any  of  3  pins  open 

other  0.0024 

For  Ull,  there  are  4  open  pin  faults  that  can  cause  the  syndrome: 

Ull  Status  Likelihood  Description _ 

working  0.997  All  16  pins  working 

csb-inactiva  0.0008  Any  of  4  pins  open 

other  0.0022 

For  U20,  5  open  pin  faults  can  cause  it: 

,Thit  was  checked  by  SSIM,  a  simple  event-driven  digital  simulator  that  uses  BASIL 
as  its  structure  description  language. 
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U20  Status  Likelihood  Description 

working  0.997  All  14  pins  working 

csb-inactive  0.001  Any  of  5  open  pins 

other  0.002 

The  impact  of  this  syndrome  is  that  if  it  can  be  shown  that  it  is  inconsis¬ 
tent  for  the  clocked  serial  burst  detector  to  be  inactive,  then  the  likelihoods 
of  candidates  involving  U10,  Ull,  and  TJ20  will  he  reduced  somewhat  —  each 
by  about  one-fourth.  The  likelihoods  of  syndrome  ceb-inactive  appearing 
in  each  of  the  three  chips  do  not  differ  by  enough  to  have  any  significant  im¬ 
pact  on  the  likelihoods  of  candidates  containing  U10,  Ull,  and  U20  relative 
to  one  another. 

Another  example  in  the  Audio  Decoder  is  the  Manchester-to-serial  de¬ 
coder;  it  is  a  sequential  circuit  entirely  encapsulated  within  the  chip  U12. 
When  the  chip  U12  has  status  mts-inactive  then  MTS01  has  status 
inactive  as  a  consequence: 

If  [statua-of  ul2  nts-inactive] 

Then  [status-of  ntsOl  inactive] 

In  the  inactive  mode,  t*  e  serially  encoded  output  of  MTS01  has  zero 
amplitude: 

If  [isa  ?mts  manchester-to-serial] 
and  [thru  ?1  ?u  (node  ?mts)  inactive] 
and  Signal  (max-min-vw  ?v  (cs  (out  y  ?uts)))  exists 
Then  [thru  ?1  ?u  (max-min-ww  ?w  (cs  (out  y  ?mts)))  0] 

The  likelihood  of  each  syndrome  for  U12  is  based  on  the  fact  that  U12 
has  20  pins,  faults  in  9  of  which  can  cause  the  syndrome  mts-inactive: 

U12  Status  Likelihood  Description 

working  .996  All  20  pins  working 

mts-inactive  .0018  Any  of  9  pins  open 

other  .0022 
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6.4  Consequences  of  Using  Syndromes 

By  helping  to  discount  unlikely  misbehaviors,  syndromes  help  a  troubleshoot¬ 
ing  engine  to  ask  for  fewer  observations,  and  this  in  turn  makes  troubleshoot¬ 
ing  complex  digital  circuits  more  feasible.  For  example,  in  the  Audio  Decoder 
one  of  the  cases  requires  9  observations  without  syndromes  to  arrive  at  a 
single-fault  diagnosis,  but  2  observations  to  arrive  at  the  same  diagnosis  if 
the  syndromes  are  included.  Since  the  cost  of  making  observations  is  gener¬ 
ally  assumed  to  be  greater  than  that  of  extra  computation,  even  more  modest 
gains  are  worthwhile.  The  reduced  number  of  observations  is  possible  because 
the  syndromes  reduce  the  relative  likelihoods  of  faults  in  the  Manchester-to- 
Berial  converter  and  in  the  burst  detector,  and  the  troubleshooting  engine  is 
generally  biased  away  from  suggesting  observations  in  the  vicinity  of  compo¬ 
nents  that  are  judged  unlikely  to  be  causing  the  observed  symptoms. 

Knowledge  about  how  components  misbehave  is  essential  in  troubleshoot¬ 
ing  complex  circuits  because  the  number  of  logically  possible  (but  unlikely) 
misbehaviors  and  the  amount  of  detail  in  the  observations  needed  to  track 
them  down  are  so  large.  The  effectiveness  of  fault  models  in  providing  focus 
stems  from  two  sources,  one  general  and  one  specific  to  digital  systems.  First, 
sometimes  it  is  much  easier  to  reason  forward  from  causes  to  their  effects  than 
the  reverse.  The  consequence  is  that  it  is  easier  to  consider  the  ways  a  com¬ 
ponent  might  plausibly  misbehave  and  rule  them  out  individually,  than  to 
try  and  logically  rule  out  all  of  them  at  once.  Second,  some  behaviorally 
complex  digital  components  have  many  internal  faults  that  all  result  in  the 
same  few  temporally  abstract  misbehaviors.  The  beneficial  consequence  is 
that  if  these  few  misbehaviors  can  be  ruled  out,  the  complex  component  will 
be  judged  an  unlikely  candidate. 

As  an  example  of  the  problems  that  result  from  the  inability  to  reason 
from  effects  to  causes,  consider  Figure  6.3.  It  shows  a  microprocessor  ded¬ 
icated  to  running  a  program  that  multiplies  the  contents  of  two  external 
registers  R1  and  R2  and  writes  the  result  to  R3.  If  the  troubleshooter  ob¬ 
serves  that  the  output  register  R3  has  bit  3  consistently  wrong,  it  will  suggest 
not  only  that  this  register  might  be  broken,  but  that  the  microprocessor,  the 
read-only  memory  where  its  instructions  are  stored,  the  clock  generator  that 
runs  the  processor,  and  so  on,  could  all  be  broken.  Intuitively,  these  other 
candidates  are  implausible;  it  might  be  logically  possible  for  the  micropro¬ 
cessor  to  be  doing  arithmetic  incorrectly,  or  for  the  clock  to  be  skipping 
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Figure  6.3:  Every  Component  is  a  Candidate 


cycles,  or  for  some  instruction  to  be  slightly  wrong,  but  if  these  things  were 
happening  the  observed  misbehavior  would  probably  be  much  more  drastic 
than  just  the  one  wrong  bit.  For  example,  if  the  microprocessor  is  adding 
numbers  wrong  it  is  likely  to  make  a  wild  branch  to  a  location  containing  an 
illegal  instruction.  If  it  could  be  inferred  from  observations  of  the  outputs  of 
the  microprocessor  that  its  instructions  from  the  ROM  were  correct,  or  that 
the  clock  output  was  correct,  those  candidates  would  not  get  proposed.  But 
logically  speaking  such  inferences  are  unfounded,  because  it  could  in  principle 
happen  that  way  —  it  is  just  very  unlikely. 

The  microprocessor  example  also  illustrates  why  knowledge  about  syn¬ 
dromes  is  useful  in  complex  digital  circuits.  A  discrepancy  at  the  output 
of  R3  in  principle  implicates  the  microprocessor,  ROM,  and  clock  generator, 
and  requires  observations  to  determine  whether  the  clock  is  running  or  not, 
whether  all  the  ROM  locations  have  the  right  value,  and  so  on.  But  experi¬ 
enced  human  troubleshooters  would  examine  the  inputs  and  outputs  of  the 
registers  first  —  and  probably  find  the  problem  there  very  quickly.  Experi¬ 
enced  troubleshooters,  upon  seeing  a  digital  circuit  perform  some  function 
correctly,  tend  to  exonerate  (at  least  temporarily)  the  complex  portions  of 
the  circuit.  The  usual  expectation  is  that  any  failure  there  will  result  in 
a  catastrophic  rather  than  a  subtle  misbehavior.  Sequential  circuits  tend 
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to  have  “inactive”  syndromes  associated  with  them  and  because  the  circuit 
did  something ,  that  syndrome  was  ruled  out.  In  the  present  example,  the 
microprocessor  gets  exonerated  because  the  output  of  register  R3  is  at  least 
changing.  In  other  domains  this  heuristic  might  not  work,  for  example  in  ana¬ 
log  domains  in  which  failures  usually  have  more  subtle  effects.  The  context, 
however,  is  troubleshooting  digital  circuit  boards,  and  many  of  the  failures 
there  are  not  at  all  subtle. 

The  reason  that  digital  circuits  misbehave  this  way  stems  from  aspects  of 
their  design.  Complex  functions  tend  to  get  implemented  in  state  machines 
or  as  firmware  for  general  processors.  The  circuits  then  use  the  same  hard¬ 
ware  components  over  and  over  to  implement  different  steps  of  the  overall 
computation,  many  of  which  depend  on  the  previous  step.  Hence  a  per¬ 
turbation  caused  by  failure  in  any  one  unit  of  hardware  rapidly  cascades 
and  propagates  its  effects.  The  very  economy  of  the  design  —  the  reuse  of 
hardware  for  different  substeps  of  a  complex  behavior  —  means  that  after 
many  cycles  the  behavior  will  little  resemble  that  intended.  Since  complex 
components  communicate  with  one  another  through  protocols  and  languages 
in  which  the  meaningful  message  sequences  occupy  only  a  fraction  of  the 
theoretically  available  bandwidth,  when  a  component  is  intended  to  produce 
a  message  sequence  understandable  by  some  other  component,  the  message 
will  probably  never  get  through.  To  extend  the  example,  suppose  the  micro¬ 
processor  must  initialize  some  slave  hardware  by  setting  up  sixteen  eight-bit 
registers  one  at  a  time.  If  the  master  processor  makes  even  one  wild  branch, 
or  one  bit  is  stuck  on  the  data  bus,  the  likelihood  that  the  slave  got  a  correct 
initialization  message  is  rather  slim. 

Fault  models  are  thus  a  powerful  form  of  heuristic  in  troubleshooting 
complex  digital  circuits,  both  because  of  the  general  property  that  they  tend 
to  focus  the  model-based  troubleshooting  program  on  likely  failures,  and 
because  of  the  specific  property  that  the  design  of  the  digital  circuits  means 
that  they  can  be  treated  as  unlikely  suspects  if  they  perform  even  a  portion 
of  their  intended  behavior.  As  the  behavioral  complexity  of  field  replaceable 
components  increases,  the  more  valuable  this  latter  phenomenon  becomes, 
since  the  model-based  troubleshooting  program  can  thereby  avoid  having  to 
reason  in  detail  about  their  internal  structure  and  behavior. 
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6.5  Summary  of  Faults  and  Misbehaviors 

Experience  with  model-based  troubleshooting  has  shown  that  with  increasing 
behavioral  complexity,  approaches  that  avoid  the  use  of  fault  models  have 
little  utility  in  the  real  world  because  the  problem  of  isolating  a  component 
in  the  face  of  limited  observability  and  behavioral  complexity  is  often  inher¬ 
ently  underconstrained  [Hamscher84].  Ideally  there  is  unlimited  observabil¬ 
ity,  every  component  has  behavior  that  is  easy  to  manipulate  algebraically, 
and  computation  is  so  cheap  that  competing  diagnoses  can  be  discriminated 
through  computationally  intensive  techniques  such  as  exhaustive  case  split¬ 
ting  over  finite  fields  of  values.  For  devices  of  any  interesting  complexity 
these  are  not  realistic  approaches.  A  partial  solution  is  to  limit  consider¬ 
ation  of  diagnoses  to  those  that  are  plausible,  rather  than  considering  all 
that  are  logically  possible.  With  this  more  limited  goal,  fault  models  can  be 
seen  as  heuristics  for  refining  estimates  of  component  failure  likelihoods.  In 
BASIL  and  TINT,  fault  models  are  called  syndromes,  and  have  both  physical 
and  functional  aspects.  The  syndromes  to  be  included  for  each  component 
type  are  chosen  on  the  grounds  of  likelihood  and  simplicity:  they  should 
account  for  a  significant  fraction  of  failures  in  components  of  that  type,  and 
they  should  result  in  drastically  simplified  behaviors.  While  total  reliance 
on  fault  models  for  automated  diagnosis  has  serious  drawbacks,  it  does  not 
follow  that  they  have  no  role  in  model-based  troubleshooting.  In  the  case  of 
digital  circuits  in  particular,  fault  models  turn  out  to  be  powerful  heuristics 
because  the  very  design  of  complex  digital  systems  means  that  fault  effects 
result  in  misbehaviors  that  are  catastrophic,  easy  to  detect,  and  easy  to  rule 
out. 
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Chapter  7 

Troubleshooting 


The  representations  of  structure  and  behavior  discussed  in  earlier  chapters 
are  heavily  influenced  by  their  intended  use  in  model-based  troubleshooting, 
in  particular,  by  their  intended  use  with  the  troubleshooting  engine  XDE1. 
Like  GDE  [deKleer87],  XDE  works  by  (i)  tagging  each  prediction  made  by 
the  behavior  model  with  its  set  of  supporting  assumptions,  (ii)  recording 
conflicts  among  the  consequences  of  these  assumptions,  (iii)  constructing  the 
set  of  candidates  (some  possibly  indicating  multiple  faults)  as  the  minimal 
covering  sets  of  those  conflicts,  and  (iv)  suggesting  as  the  next  observation 
the  one  expected  to  most  reduce  the  uncertainty  among  the  set  of  candidates. 

XDE  extends  this  procedure  by  adding  two  new  operations  that  can  be 
performed  before  suggesting  new  observations:  decomposition,  which  enables 
hierarchic  diagnosis,  and  refinement,  which  enables  the  use  of  fault  mod¬ 
els.  Decomposition  and  refinement  are  integrated  into  the  procedure  with 
decomposition  having  priority  over  refinement,  which  in  turn  has  priority 
over  probe  selection.  XDE  constructs  candidates  that  are  assigned  weights 
according  to  their  relative  likelihood.  Those  with  weight  above  10%  are  el¬ 
igible  for  refinement  and  decomposition.  After  each  new  observation,  XDE 
finds  the  most  likely  candidate  and  refines  it.  Refinement  involves  selecting 
the  most  likely  syndrome  for  a  component  believed  faulty  in  that  candidate, 
and  predicting  the  effects  of  that  syndrome.  If  there  is  no  such  refinement 
operation  available,  it  decomposes  a  component  instead.  If  no  diagnosis  is 
eligible  for  this  either,  it  suggests  a  probe. 

1  extended  Diagnostic  Engine. 
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Tliis  chapter  presents  XDE  and  its  interaction  with  the  representation 
choices  made  in  the  structure  language  BASIL  and  behavior  language  TINT. 

7.1  Conflicts  and  Candidates 

XDE  inherits  the  terminology  of  assumptions ,  environments ,  conflicts,  and 
candidates  from  GDE,  interacting  with  BASIL  and  TINT  mainly  through 
status-of  predications. 

In  TINT  an  assumption  is  a  unit  clause  supporting  one  predication. 
For  example,  let  U32w  denote  the  assumption  that  chip  U32  has  the  sta¬ 
tus  “Working.”  U32w  is  a  unit  clause  attached  to  the  single  predication 
[status-of  U32  working]  (top  of  Figure  7.1). 


Figure  7.1:  Predications,  Assumptions,  and  Environments 
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Environment*  are  set*  of  assumption*;  for  example,  {U32w}  is  the  en¬ 
vironment  in  which  chip  U32  is  assumed  to  be  working.  The  predication 
[status-of  U32  working]  could  be  true  in  more  than  one  environment, 
and  the  set  of  environments  in  which  it  is  true  is  called  its  label.  For  ex¬ 
ample,  there  could  be  another  assumption  that  the  entire  board  is  working; 
[status-of  U32  working]  would  be  true  also  in  the  singleton  environment 
consisting  of  that  assumption. 

A  clause  is  a  disjunction  of  predications.  When  a  clause  is  installed  con¬ 
necting  two  or  more  predications,  some  predications  may  become  true  in  new 
environments.  For  example,  inverter  chiplet  U32a  is  part  of  chip  U32.  The 
clause  -i  [status-of  U32  working]  V  [status-of  U32a  working]  would 
be  installed  (middle  of  Figure  7.1),  since  if  U32  is  working  then  all  its  sub¬ 
parts  including  U32a  must  be  working.  Because  of  this  clause,  [status-of 
U32a  working]  would  become  true  in  the  environment  {U32w}> 

TINT  rules  fire  on  predications  and  make  deductions  in  the  form  of  (usu¬ 
ally  new)  predications.  Each  firing  results  in  the  installation  of  a  clause 
connecting  the  old  predications  to  the  new  predication.  For  example,  sup¬ 
pose  a  rule  fired  to  deduce  that  the  mode  of  U25a  was  normal,  and  installed  a 
clause  to  that  effect.  The  new  predication  would  then  be  true  in  the  environ¬ 
ment  that  was  a  union  of  the  environments  of  the  old  predications  (bottom 
of  Figure  7.1).  Ultimately,  any  consequence  of  assuming  that  U25  is  working 
will  have  some  superset  of  the  environment  {U32w}  in  its  label. 

If  TINT  makes  two  different  deductions  about  the  value  of  a  signal  at 
a  certain  time,  a  conflict  is  recorded.  At  least  one  of  the  assumptions  un¬ 
derlying  those  deductions  must  be  false.  The  conflict  is  the  union  of  the 
environments  of  the  contradictory  deductions  and  is  denoted  (...}.  For  ex¬ 
ample,  if  a  certain  signal  was  supposed  to  be  10*  in  the  environment  {U25w, 
U32w}  but  it  was  supposed  to  be  0  in  the  environment  {U30w}>  then  the 
union  (U25w>  U32w,  U30w)  »  &  conflict. 

All  of  the  assumptions  that  XOE  makes  are  about  the  statuses  of  physical 
components,  hence  all  the  candidates  that  it  produces  are  sets  of  physical 
components  corresponding  to  repairs.  For  example,  if  the  above  conflict  were 
the  only  conflict  known,  then  the  candidates  are  its  minimal  covering  sets 
[U25w]>  [U32w],  and  [U30w]*  At  least  one  of  the  chips  U25,  U32,  or  U30 
needs  to  be  replaced. 
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7.2  Decomposition 

In  hierarchic  diagnosis,  a  component  suspected  of  being  faulty  can  be  de¬ 
composed  to  reveal  its  subcomponents.  The  decomposition  of  a  component 
involves  two  conceptually  separate  operations:  (i)  firing  the  behavior  rules 
for  the  subcomponents,  which  usually  refer  to  signals  at  a  different  level  of 
abstraction  than  that  of  their  parent,  and  (ii)  making  the  troubleshooting  en¬ 
gine  entertain  fault  hypotheses  about  each  individual  subcomponent,  rather 
than  about  the  parent.  In  traditional  hierarchic  diagnosis  these  two  opera¬ 
tions  are  usually  considered  identical.  That  works  fine  within  a  single  strict 
hierarchy,  as  in  HT  [Davis84]  and  DART  [Genesereth84].  To  deal  with  the 
physical  and  functional  hierarchies  in  BASIL,  however,  it  is  advantageous  to 
draw  a  distinction  between  the  two  operations. 

To  make  the  TINT  behavior  rules  for  a  certain  component  fire  requires 
creating  an  explicit  status-of  predication  for  it.  This  operation  is  called  in¬ 
stantiation.  Instantiating  inverter  U32a,  for  example,  creates  the  predication 
[atatus-of  U32a  working] .  Rules  about  the  mode  and  behavior  of  U32a 
will  only  fire  after  [status-of  U32a  working]  becomes  true.  Since  U32a 
is  a  part  of  chip  U32,  if  U32  is  believed  to  be  working  then  U32a  should  be 
believed  to  be  working  too.  Thus  a  clause  linking  the  two  is  installed,  as  illus¬ 
trated  earlier  in  Figure  7.1.  Also,  the  parent  component  should  be  believed 
to  be  working  if  all  its  subcomponents  are.  When  all  of  the  subcomponents 
of  a  parent  component  have  been  instantiated,  another  clause  is  installed 
that  makes  the  parent  atatus-of  predication  true  if  all  the  subcomponent 
status-of  predications  are. 

After  instantiating  all  of  the  subcomponents  of  a  parent  component,  XDE 
will  not  construct  candidates  involving  those  subcomponents  until  an  as¬ 
sumption  (unit  clause)  has  been  created  for  each  of  them.  After  being  cre¬ 
ated  these  new  assumptions  will  then  appear  in  the  labels  of  some  predictions 
about  the  behavior  of  the  device,  will  appear  in  conflicts,  and  thus  will  ap¬ 
pear  in  candidates.  The  parent  component  will  have  the  status  working  in 
the  environment  consisting  of  all  the  assumptions  about  its  subcomponents 
(unless  that  environment  is  itself  a  conflict).  Any  assumptions  about  the 
original  parent  component  are  no  longer  needed  and  can  be  deleted*.  This 

’The  binary  clauses  of  the  form  -^parent  V  child  are  deleted  too,  a  detail  that  improves 
the  efficiency  of  the  TMS. 
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operation  is  called  assumption  splitting  —  any  assumptions  about  the  status 
of  a  component  are  deleted  and  one  assumption  is  created  for  each  of  its 
instantiated  subcomponents. 

Suppose  devices  were  represented  using  only  one  component  hierarchy. 
If  a  top-level  component  were  a  candidate,  then  its  subcomponents  would 
be  instantiated  and  some  rules  would  run.  Then  the  assumption  that  the 
component  is  working  would  be  split  and  new  conflicts  would  be  discovered 
involving  its  subcomponents.  Some  of  the  subcomponents  would  then  ap¬ 
pear  in  candidates.  Each  of  these  could  then  be  treated  recursively  —  their 
subcomponents  instantiated  and  their  assumptions  split. 

It  is  helpful  for  all  the  assumptions  present  at  any  moment  to  be  in¬ 
dependent,  since  this  simplifies  candidate  ranking.  If  the  hierarchy  is  not 
guaranteed  to  be  strict,  it  takes  extra  work  to  ensure  that  each  pair  of  as¬ 
sumptions  is  independent,  since  any  pair  of  assumptions  might  refer  to  two 
components  that  share  subparts.  If  the  hierarchy  is  strict,  at  each  descend¬ 
ing  step  it  is  easy  to  guarantee  that  this  never  happens.  Thus  it  is  useful  to 
locate  assumptions  only  within  strict  part-of  hierarchies. 

Now  suppose  that  there  are  two  hierarchies  and  that  there  is  no  obvious 
correspondence  between  nodes  in  the  two.  BASH,  for  example,  has  physical 
components  and  functional  components  in  separate  hierarchies  that  meet  at 
their  leaves.  Figures  7.2  through  7.4  show  an  example.  There  are  two  boards 
A  and  B,  each  having  several  chips.  Three  of  the  chips  on  A  and  two  of  the 
chips  on  B  form  a  single  four-bit  adder.  The  four-bit  adder  is  composed  of 
two  two-bit  ad  den  tbl  and  tb2.  Each  two-bit  adder  is  composed  two  full- 
adders,  each  full-adder  is  composed  of  two  half-adders  and  an  OR  gate,  and 
each  half-adder  is  composed  of  an  AND  gate  and  an  XOR  gate.  Each  of  the 
full-adden  fal  through  fa4  is  distributed  across  three  chips  —  a  quad  AND 
gate  chip,  a  quad  XOR  gate  chip,  and  a  quad  OR  gate  chip. 

In  BASIL,  assumptions  about  the  status  of  components  are  attached  to 
physical  components.  This  suggests  that  the  diagnosis  proceed  top-down 
through  the  physical  hierarchy,  always  staying  as  high  as  possible.  However, 
TINT  behavior  rules  are  attached  only  to  the  components  in  the  functional 
hierarchy.  While  descending  through  the  physical  hierarchy,  it  makes  sense 
to  fire  the  behavior  rules  for  ever  more  detailed  functional  components.  Since 
there  is  no  obvious  correspondence  between  the  components  in  the  two  hi¬ 
erarchies  (Figure  7.5),  there  is  a  coordination  problem  —  how  deep  into  the 
functional  hierarchy  should  components  be  instantiated  for  each  newly  split 


Board  A  Board  B 


assumption  in  the  physical  hierarchy? 

For  each  physical  component,  there  is  some  functional  component  that 
fully  contains  it.  For  example,  chips  QA1  and  QX1  are  fully  contained  within 
the  two-bit  adder  tbl.  Chip  QOl  is  fully  contained  only  within  the  whole 
four- bit  adder.  When  chip  QA1  has  an  assumption  attached  to  it  so  that  it 
can  appear  in  diagnoses,  rules  should  at  least  be  getting  run  for  every  com¬ 
ponent  that  fully  contains  it.  But  this  is  not  deep  enough,  since  there  would 
never  be  enough  behavioral  detail  to  distinguish  between  diagnoses  involv¬ 
ing  that  physical  component  and  others  contained  by  the  same  functional 
component.  For  example,  if  only  the  rules  at  the  level  of  two-bit  adders  were 
being  run,  there  would  be  no  way  to  detect  a  conflict  in  which  QAl  appeared 
but  QX1  did  not.  This  is  because  QAl  and  QX1  must  both  be  working  for 
either  of  the  two-bit  adders  to  be  working.  Going  one  level  deeper  in  the 
functional  hierarchy  would  not  help  —  at  the  level  of  full-adders  there  is  still 
no  way  to  And  a  conflict  involving  QAl  but  not  QX1,  since  both  must  be 
working  for  full-adders  fal  and  fa2  to  work.  Going  one  level  deeper  in  the 
physical  hierarchy,  however,  would  help-  with  QAl  is  assumed  to  be  work¬ 
ing,  rules  would  be  run  for  any  components  that  fully  contain  any  of  its 
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Figure  7.5:  Physical  and  Functional  Decompositions  of  the  Four- Bit  Adder 
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subcomponent  AND  gates  al,  a2,  a3,  or  a4.  In  this  case,  the  corresponding 
functional  components  happen  to  be  the  gates  themselves,  and  the  behavior 
rules  at  the  level  of  gates  have  enough  behavioral  detail  to  detect  conflicts 
involving  QA1  without  involving  QX1. 

This  yields  the  criterion  that  XDE  uses  to  decide  how  deep  in  the  func¬ 
tional  hierarchy  to  run  rules,  given  a  certain  level  of  assumption  in  the  physi¬ 
cal  hierarchy:  instantiate  all  functional  components  that  fully  contain  any  im¬ 
mediate  physical  subcomponent.  A  physical  component  is  “fully  contained” 
if  it  is  a  physically  maximal  part-of  the  functional  component,  abbreviated 
xpart-of.  The  xpart-of  relation  holds  between  each  physical  component 
and  zero  or  more  functional  components  (Figure  7.6).  A  physical  component 
is  a  physically  maximal  part  of  a  functional  component  when  it  all  its  sub¬ 
components  help  to  implement  that  functional  component.  Strictly  speaking, 
it  is  when  all  the  leaf  ppart-of  descendants  of  the  physical  component  are 
leaf  fpart-of  descendants  of  the  functional,  but  the  parent  of  the  physical 
component  is  not  maximal.  For  example,  QA1  is  xpart-of  tbl  because  all 
of  its  leaf  subcomponents  are  leaf  subcomponents  of  tbl,  but  the  same  is 
not  true  of  the  parent  of  QA1,  Board  A.  Hence  if  Board  A  were  assumed  to 
be  working,  QA1  is  an  immediate  physical  subcomponent  of  Board  A  and  is 
xpart-of  tbl,  so  tbl  would  be  instantiated.  The  children  of  tbl  would  not. 

There  is  one  further  complication,  which  is  that  for  each  layer  of  physical 
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Figure  7.6:  XPART-OF  Relations  in  the  Four-Bit  Adder 
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detail,  there  may  be  several  layers  of  functional  detail,  and  XDE  proceeds 
through  the  functional  detail  one  level  at  a  time.  The  wdecompositionn  oper¬ 
ation  may  thus  be  applied  to  the  same  physical  component  more  than  once, 
although  sometimes  it  will  result  in  functional  components  being  instanti¬ 
ated,  and  other  times  in  splitting  of  assumptions.  The  table  below  shows  an 
example  of  the  order  in  which  XDE  would  intersperse  assumption  splittings 
and  component  instantiations. 


All  Existing 
Step  Assumptions 

1.  A,  B 

2. 

3. 

4.  QAl,  QXl,  QOl,  B 

5. 

6. 

7. 

8.  Al,  A2,  A3,  A4,  QXl,  QOl,  B 
Al,  A2,  A3,  A 4, 

Xl,  X2,  X3,  X4,  QOl,  B 


New  Instantiations 
of  Functional  Components 


adder 
tbl,  tb2 

fal,  fa2 

hi,  h2,  h3,  h4,  ol,  o4 
al,  a2,  a3,  a4,  xl,  x2,  x3,  x4 


Step  1:  both  boards  are  assumed  working  and  no  components  are  instan¬ 
tiated.  Step  2:  the  adder  is  instantiated.  Suppose  the  conflict  (A,  B)  results. 
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Now  [A]  and  [B]  are  candidates.  Step  3:  the  subcomponents  of  the  adder,  tbl 
and  tb2,  are  instantiated.  No  further  progress  can  be  made  in  the  functional 
hierarchy.  Step  4:  split  the  assumption  that  A  is  working.  The  conflict  {A, 
B)  is  replaced  by  (QA1,  QX1,  QOl,  B).  Now  [QA1],  [QXl],  [QOlJ,  and  [Bj 
will  be  candidates.  Steps  5  through  7:  instantiate  functional  components  all 
the  way  to  the  level  of  gates,  within  the  full-adders  fal  and  fa2.  Suppose  the 
conflict  {QA1,  QXl)  is  discovered.  Now  [QA1]  and  [QXl]  are  candidates. 
Step  8:  split  the  assumption  QA1;  Step  9:  split  the  assumption  QXl.  There 
are  no  instantiations  to  do,  since  the  gates  were  primitives. 


T.3  Ranking  and  Refinement 

The  ranking  of  candidates  in  XDE  takes  syndromes  into  account.  The  method 
is  an  extension  of  the  candidate  ranking  method  discussed  in  the  previous 
chapter. 

Without  syndromes,  candidate  ranking  works  as  follows.  Each  compo¬ 
nent  is  assigned  a  prior  probability  that  it  is  working  based  on  an  estimate 
of  its  physical  complexity.  Assuming  independence  among  failures  in  all 
components,  the  probability  of  a  candidate  is  thus  the  probability  that  all 
components  have  just  the  status  assigned  in  that  candidate.  For  exam¬ 
ple,  the  candidate  [U25]  assigns  the  status  “other”  to  U25  and  “working” 
to  the  other  components.  The  probability  assigned  this  candidate  is  then 
(1  —  p(U25))  x  p(U30)  x  p(U32).  All  candidates  are  then  assigned  a  weight 
that  is  their  probability  normalized  with  respect  to  all  the  minimal  candi¬ 
dates.  This  scheme  yields  intuitively  satisfying  results,  since  candidates  in¬ 
volving  single  faults  are  generally  more  likely  than  those  with  multiple  faults, 
and  the  candidates  with  the  highest  weights  are  those  involving  components 
with  higher  failure  rates. 

In  XDE,  components  can  have  statuses  other  than  simply  working  or  not 
working,  so  there  will  be  more  candidates  and  a  more  elaborate  ranking 
function.  The  benefit  of  the  additional  complexity  and  expense  is  that  the 
troubleshooting  engine  exhibits  better  focusing.  When  candidates  involving 
syndromes  are  shown  to  be  inconsistent  with  observations,  other  candidates 
will  appear  more  likely,  and  the  troubleshooting  engine  will  focus  its  efforts 
on  those  likelier  candidates. 

To  use  syndromes,  XDE  refines  candidates  by  installing  assumptions  of 
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the  form  “physical  component  X  is  exhibiting  syndrome  S,"  denoted  Xg.  For 
example,  oscillator  chips  have  the  syndrome  inactive;  an  assumption  that 
oscillator  U25  is  inactive  is  denoted  U25t.^<~.  Because  each  of  the  statuses 
of  a  component  are  mutually  exclusive,  creating  this  assumption  would  result 
in  the  conflict  (U25w,  U25t _ n..).  The  assumption  that  U25  is  inactive  re¬ 

sults  in  the  prediction  that  its  output  will  have  frequency  zero,  which  in  turn 
has  other  consequences.  Usually,  new  conflicts  involving  U25t-^o..  will  be 
discovered.  Candidates  are  still  constructed  as  the  minimal  covering  sets  of 
conflicts,  but  to  deal  with  syndromes  it  is  necessary  to  consider  the  comple¬ 
ments  of  the  candidates,  the  maximal  consistent  environments.  A  maximal 
consistent  environment  is  one  to  which  no  assumption  can  be  added  without 
making  it  inconsistent.  There  is  one  maximal  consistent  environment  per 
candidate.  XDE  constructs  diagnoses  {torn  maximal  consistent  environments 
as  illustrated  by  example  below. 

Consider  a  version  of  the  clock  generator  troubleshooting  example,  shown 
in  Figure  7.7.  The  three  field  replaceable  components  are  the  chips  U25,  U30, 


(out  0  u25a) 


U26a 


Oscillator 


1/ 


Chip  U26 


Figure  7.7:  Clock  Generator 
(out  y  u25a)  (out  q  u30b) 


and  U32.  To  better  illustrate  the  refinement  operation,  assume  that  (i)  chips 
are  primitives  and  etches  do  not  fail,  and  (ii)  all  antibehavior  rules  are  dis¬ 
abled.  The  initial  symptom  that  (out  q  u30b)  is  a  constant  1  instead  of 
having  frequency  2.5  Mhz  yields  the  conflict  (U25w,  U30w,  U32w)>  mean¬ 
ing  that  one  of  these  components  is  faulty.  Refining  the  candidate  [U25w] 
with  the  syndrome  U25t^.^..  yields  the  conflict  (U25w>  U25w*TC)  as  well. 
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U25r— if  consistent  with  the  observations  and  with  U30  and  U32  work¬ 
ing  properly.  The  minimal  covering  setf  of  these  two  conflicts  are  [U25w]> 
[U30w>U25iaac4ir«]i  and  [U32w.U25t-^-.].  The  maximally  consistent  envi¬ 
ronments  are  their  complements  {U30w,U32w,U25t..,*i..}.  {U25w»U30w} 
and  {U25wiU32w}  respectively.  Each  maximally  consistent  environment  de¬ 
notes  a  consistent  assignments  of  statuses  to  every  component. 

These  environments  denote  three  possibilities:  either  (i)  U30  and  U32  are 
working  and  U25  is  exhibiting  syndrome  inactive,  or  (ii)  U25  and  U30  are 
working  and  U32  has  status  other,  or  (iii)  U25  and  U32  are  working  and  U30 
has  status  other.  There  is  a  fourth  possibility,  that  U30  and  U32  are  working 
and  U25  has  status  other  —  it  might  be  neither  working  nor  inactive.  Each 
maximal  consistent  environment  that  contains  assumptions  about  syndromes 
yields  several  diagnoses,  one  for  each  subset  of  those  assumptions.  In  this 
case  there  is  only  one  such  assumption  and  hence  only  one  extra  diagnosis. 
This  yields  four  diagnoses  in  all,  three  corresponding  to  maximally  consistent 
environments  and  one  created  by  deleting  assumptions  about  syndromes  from 
those  environments. 

Each  diagnosis  that  XDE  generates  in  this  manner  specifies  a  single  status 
for  each  component  mentioned  by  any  assumption.  For  brevity  of  notation, 
a  diagnosis  is  denoted  f . . .  ]  and  shows  only  the  component  statuses  that  are 
not  working.  For  example,  [U25t— denotes  a  diagnosis  in  which  only  the 

assumptions  U30w>  U32w>  and  U25t _ _  are  present.  [U25otk«*l  denotes  a 

diagnosis  in  which  only  U30w  and  U32w  are  present. 

Each  diagnosis  has  an  initial  likelihood  corresponding  to  the  prior  proba¬ 
bility  that  every  component  has  the  status  assigned,  assuming  independence 
between  components3.  The  distribution  assigned  to  each  set  of  component 
statuses  is  derived  from  the  physical  complexity  of  the  component,  as  de¬ 
scribed  in  Chapter  6.  The  weight  assigned  to  each  diagnosis  is  its  likelihood 

*  Although  BASIL  guarantees  that  physical  components  do  not  share  parts  so  that  their 
failures  can  be  assumed  to  be  independent,  XDE  does  handle  the  more  general  case  of 
shared  parts.  Each  maximal  consistent  environment  may  have  several  independent  subsets 
of  assumptions,  each  of  which  would  derive  the  same  consequences  as  the  full  environment. 
XDE  computes  the  likelihood  of  diagnoses  by  taking  the  maximum  likelihood  of  any  in¬ 
dependent  subset,  which  is  combinatorially  expensive  if  independence  is  not  maintained. 
Although  not  explored  extensively,  XDE  should  thereby  be  able  to  correctly  assign  likeli¬ 
hoods  to  diagnoses  that  involve  dependent  failures,  since  it  would  compute  that  likelihood 
based  only  on  the  likelihood  of  the  original  (independent)  failure. 
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normalised  over  all  diagnoses  (Unn  <>tw  and  Unn  are  hereafter  abbre¬ 

viated  to  Unn  o  and  Unn  j): 


Diagnosis 


Likelihood  Weight 
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Component  Candidate  Weights  Repair  Weight 

U30  .53  =  .53 

U32  .46  +  .0046  +  .000024  =  .46 

U25  .0085  +  .0046  +  .000024  =  .013 

The  component  status  likelihood  estimates  can  be  perturbed  greatly  and 
still  yield  the  same  candidate  rankings.  It  is  the  relative  magnitudes  of  the 
likelihoods  associated  with  statuses  other  than  working  in  different  com¬ 
ponents  that  matter,  not  their  particular  values.  In  the  case  of  the  clock 
generator,  for  example,  the  same  rankings  would  have  been  obtained  had  the 
complexity  of  the  oscillator  been  estimated  as  low  as  40  (instead  of  100),  and 
as  long  as  not  all  oscillator  failures  resulted  in  status  inactiva.  The  table 
below  shows  some  examples  of  how  much  variation  there  can  be.  Each  of  the 
last  four  columns  of  the  table  below  shows  an  alternative  set  of  component 
status  likelihoods  that  result  in  the  same  candidate  rankings  as  above: 


Component  Status  Likelihood 


U25 

working 

.60 

.80 

.999 

inactive 

.20 

.15 

.0005 

other 

.20 

.05 

.0005 

.20 

U30 

working 

.70 

.85 

.9998 

.85 

other 

.30 

.15 

.0002 

.15 

U32 

working 

.80 

.90 

.9999 

.9999 

other 

.20 

.10 

.0001 

.0001 

Note  that  the  results  remain  stable  even  though  likelihoods  of  the  other 
and  inactive  statuses  vary  by  orders  of  magnitude,  so  long  as  their  order  is 
preserved. 

The  scheme  that  XDE  uses  for  generating  and  ranking  diagnoses  is  expen¬ 
sive.  Both  GDE  and  XDE  suffer  from  combinatorial  explosion  of  candidates, 
but  the  refinement  operation  that  XDE  provides  exacerbates  the  problem. 
In  pathological  cases  the  number  of  candidates  (or  maximal  consistent  envi¬ 
ronments)  can  be  exponential  in  the  number  of  conflicts,  hence  exponential 
in  the  number  of  components.  In  XDE,  the  number  of  candidates  is  at  least 
exponential  in  the  number  of  syndromes  installed.  Suppose  there  are  n  com¬ 
ponents  C<  and  each  has  one  syndrome  S.  Then  there  are  2n  assumptions,  n 
of  the  form  Cw  and  n  of  the  form  Cs-  There  are  at  least  n  conflicts  (Cw>  Cs). 
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These  n  conflicts  share  no  assumptions  and  if  there  are  no  other  conflicts, 
there  will  be  at  least  2“  candidates.  In  experiments  with  the  current  imple¬ 
mentation  of  XDE,  the  amount  of  time  each  new  refinement  operation  took 
approximately  doubled  and  was  stopped  after  the  eighth  refinement.  Further¬ 
more,  there  may  be  many  maximal  consistent  environments  containing  more 
than  one  syndrome  assumption.  This  has  two  undesirable  consequences. 

The  first  undesirable  consequence  is  that  a  maximal  consistent  environ¬ 
ment  with  n  syndrome  assumptions  generates  2n  diagnoses,  one  for  each 
combination  of  those  n  syndromes.  For  example,  if  it  is  consistent  for  X  to 
have  failure  status  Si  and  Y  to  have  failure  status  S2  simultaneously,  then 
it  is  also  consistent  for  X  to  have  status  SI  and  Y  to  have  status  other, 
and  vice  versa.  Thus  one  maximal  consistent  environment  generates  three 
diagnoses.  Although  several  different  maximal  consistent  environments  may 
generate  the  same  diagnoses,  the  potential  for  further  combinatorial  explosion 
is  present.  XDE  does  not  do  anything  about  this  problem.  It  maintains  the 
complete  set  of  maximal  consistent  environments  and  diagnoses  computable 
from  the  current  conflicts. 

The  second  undesirable  consequence  is  that  syndromes  add  new  informa¬ 
tion  to  the  behavior  model  from  which  many  useless  deductions  will  be  made 
unless  some  additional  control  is  exercised.  Since  syndromes  usually  have  low 
likelihoods,  environments  containing  multiple  syndromes  will  have  exception¬ 
ally  low  relative  likelihoods.  Each  syndrome  results  in  new  predictions  being 
made  in  the  behavior  model;  for  example,  the  inactive  syndrome  for  oscil¬ 
lators  results  in  the  prediction  that  the  frequency  of  the  oscillator  output  is 
0.  Since  the  predictions  from  different  syndromes  will  interact,  there  will  be 
many  predictions  that  are  present  only  in  environments  of  very  low  likelihood. 
To  deal  with  this  problem,  XDE  controls  the  running  of  rules  in  such  a  way 
as  to  avoid  doing  work  in  environments  of  low  likelihood.  XDE  pays  the  price 
of  explicitly  switching  from  one  maximal  consistent  environment  to  the  next, 
making  predictions  only  in  that  one  environment,  and  thereby  only  working 
on  a  few  diagnoses  at  a  time.  This  allows  XDE  to  look  for  contradictions 
only  in  the  diagnoses  with  the  highest  weights,  never  making  deductions  in 
environments  whose  likelihoods  lie  below  a  fixed  threshold  percentile.  Ex¬ 
plicit  context  switching  is  a  high  price  to  pay  for  this  control,  because  the 
wont-case  overhead  is  proportional  to  the  total  number  of  clauses  times  the 
number  of  diagnoses  that  get  explored.  However,  it  is  possible  to  get  the  best 
of  both  worlds,  and  [deKleer8€b]  and  [Geffner86]  both  demonstrate  schemes 


7.3.  RANKING  AND  REFINEMENT 


199 


upon  which  a  more  efficient  implementation  might  be  built  someday. 

To  summarize,  the  procedure  that  XDE  performs  whenever  a  new  conflict 
is  discovered  is  as  follows: 

1.  Update  the  set  of  maximal  consistent  environments.  Maximal  consis¬ 
tent  environments  are  the  complements  of  candidate *  as  constructed  by 
GDE. 

2.  Generate  the  set  of  diagnoses  fro m  the  maximal  consistent  environ¬ 
ments.  Diagnoses  are  the  subsets  of  the  maximal  consistent  environ¬ 
ments  obtained  by  deleting  syndrome  assumptions. 

3.  Assign  a  probability  to  each  diagnosis.  Since  each  diagnosis  assigns  a 
status  to  every  component  mentioned  by  the  universe  of  assumptions, 
the  probability  of  the  diagnosis  is  computed  as  the  probability  of  the 
conjunction  of  all  those  statuses. 

4.  Normalize  the  probability  of  each  diagnosis  with  respect  to  all  the  other 
diagnoses.  This  is  the  weight  of  each  diagnosis. 

5.  Compute  the  repair  weight  of  each  component.  The  repair  weight  is 
the  sum  of  the  weights  of  diagnoses  in  which  that  component  is  broken. 

If  no  syndromes  are  ever  introduced,  the  set  of  diagnoses  is  the  same  as 
the  set  of  maximally  consistent  environments,  and  the  ranking  is  then  ex¬ 
actly  as  in  GDE.  The  addition  of  syndromes  into  that  basic  troubleshooting 
engine  obviously  introduces  complexities  into  the  generation  and  ranking  of 
diagnoses.  The  advantage  of  doing  so  is  that  introducing  a  new  syndrome 
assumption  into  an  existing  set  of  diagnoses  can  drastically  shift  the  distri¬ 
bution  of  weights  among  the  diagnoses,  provided  that  the  syndrome  turns 
out  to  produce  new  conflicts  with  existing  or  subsequent  observations.  For 
example,  in  the  clock  generator  used  as  an  example  throughout  this  section, 
without  the  syndrome  U25jri-.f-,t  the  observation  that  (11  (out  j  u32a>) 
was  changing  would  have  added  no  new  information  and  the  oscillator  would 
have  remained  a  likely  diagnosis.  With  it,  the  weights  of  candidates  involving 
U25  are  all  reduced  below  2%  each. 
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7.4 


Making  Observations 


XDE  selects  informative  observations  using  the  same  heuristic  one- level  looka¬ 
head  strategy  as  GDE,  but  there  are  complications  that  arise  in  the  digital 
circuit  domain  as  represented  in  TINT.  Among  these  complications  are  that 
(i)  imprecise  predictions  hamper  the  ability  of  the  lookahead  strategy  to  make 
good  choices,  (ii)  observations  must  be  temporally  quantified,  and  (iii)  the 
possible  observations  have  differing  granularities  and  costs.  XDE  has  partial 
solutions  to  the  latter  two  problems,  but  the  problems  resulting  from  impre¬ 
cise  predictions  are  fundamental  to  any  representation  that  trades  precision 
for  efficiency.  After  a  brief  review  of  the  probe  selection  strategy,  each  of 
these  issues  will  be  considered  in  turn. 

The  expected  information  from  a  given  observation  can  be  quantified 
using  the  entropy  of  the  possible  outcomes  of  the  observation.  The  entropy 
is  the  sum  of  Pi  log  p;  where  t  ranges  over  all  outcomes  and  each  p,  is  the 
combined  weight  of  the  diagnoses  that  predict  outcome  i.  Continuing  the 
clock-generator  example  from  above,  the  following  set  of  diagnoses  and  their 
weights  result  after  the  initial  symptom  is  discovered: 


Diagnosis  Likelihood  Weight 


[U25i] 

.00984 

[U30o] 

.00315 

.200 

[U320j 

.00276 

.175 

|U250j 

.000051 

.00323 

The  behavior  model  makes  many  predictions;  a  small  sample  is  shown 
below  along  with  the  environments  in  which  they  hold  and  the  weights  of 
the  diagnoses  that  are  definitely  consistent  with  those  environments.  The 
prediction  that  the  output  of  U30a  is  not  changing  is  true  in  the  empty 
environment  and  so  is  known  to  be  consistent  with  all  the  diagnoses: 


7.4.  MAKING  OBSERVATIONS 


201 


Weights  of 

Value  at  Consistent 

Signal _ time  10*  Environments  Diagnoses 


(  changing- vrt 

0  10*  (11  (out  0  u25a) ) ) 
(changing*- vrt 
0  10*  (11  (out  0  u25a))> 
(changing- vrt 
0  10*  (11  (out  y  u32a) ) ) 
(changing- vrt 
0  10*  (11  (out  y  u32a) ) ) 
(changing- vrt 
0  10*  (11  (out  0  u30a) ) ) 


t 

{U25w} 

o 

o 

.175 

nil 

{U25i} 

.623 

t 

{U25w»U32w} 

.200 

nil 

{U25i,U32w} 

.623 

nil 

{> 

.623, 

.175, 

.200, 

.0051 

Each  of  the  ports  (out  0  u25a),  (out  y  u32a),  and  (out  0  u30a)  can 
be  observed  to  see  whether  its  logic-level  signal  is  changing.  The  expected 
benefit  of  making  an  observation  at  each  port  is  the  negative  of  the  entropy 
of  the  distribution  of  weights  among  the  various  outcomes.  An  approximate 
version  of  the  computation  is  shown  in  the  table  below. 

Port  Sum  over  —pi  log  pi 


(out  0  u25a) 

(out  y  u32a) 
(out  0  u30a) 


-(.200  +  .175)  log(.200  +  .175) 
— (.623)log(.623) 
— (.200)log(.200) 
—(.623)  log(.623) 
1  log  1 


=  .956 

=  .890 
=  0.0 


The  last  line  shows  that  probing  a  signal  that  has  already  been  observed 
has  zero  value.  The  other  values  indicate  that  probing  the  output  of  the 
oscillator  u25a  maximizes  the  expected  information  and  so  is  preferable  to 
other  probes  (when  different  probes  yield  the  same  estimated  information 
XDE  picks  one  of  them  essentially  at  random). 

The  relative  likelihoods  of  component  statuses  vorking,  other,  and  so 
forth  impact  the  probe  selections  by  influencing  the  weights  of  diagnoses. 
Diagnoses  with  high  weights  tend  to  bias  XDE  toward  choosing  probes  in  the 
vicinity  of  the  components  they  mention.  For  example,  had  the  likelihood  of 
failure  in  U30  been  greater  than  the  likelihood  of  failure  in  the  oscillator,  the 
probe  at  (out  y  u32a)  would  have  been  chosen  instead.  Roughly,  the  higher 
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the  repair  weight  of  a  component  (that  is,  the  more  diagnoses  it  appears  in 
and  the  higher  the  relative  likelihoods  of  those  diagnoses)  the  more  highly 
ranked  the  probes  in  its  vicinity. 

7.4.1  Prediction  Strength  and  Probe  Selection 

Weak  predictions  of  behavior  cause  the  troubleshooting  engine  to  make  poor 
estimates  of  the  information  to  be  obtained  at  possible  probe  points.  This 
in  turn  may  cause  it  to  wastefully  ask  for  observations  that  do  not  produce 
any  informative  conflicts.  As  discussed  earlier,  there  are  several  reasons  why 
the  behavior  representation  may  be  unable  to  make  predictions:  (i)  abstrac¬ 
tions  may  result  in  component  behaviors  not  being  total  functions,  (ii)  local 
propagation  of  signal  values  may  reach  impasses,  or  (iii)  the  behavior  of  com¬ 
ponents  may  be  too  complex  for  there  to  be  any  good  antibehavior  rules.  In 
the  clock  generator  example  being  used  at  the  moment,  the  reason  is  that 
the  antibehavior  rules  have  been  disabled  for  presentation  purposes. 

Weak  predictions  raise  the  technical  problem  of  estimating  the  expected 
information  from  a  probe  when  some  diagnoses  make  no  prediction  about  the 
outcome  of  the  probe.  For  example,  [U25o]  makes  no  prediction  about  the 
signal  at  port  (out  0  u25a).  The  problem  is  that  computing  the  entropy 
requires  a  distribution  of  probabilities  that  sum  to  1.  There  are  at  least  four 
ways  of  handling  the  weight  that  should  be  distributed  among  the  diagnoses 
that  make  no  prediction: 

Assume  that  the  other  diagnoses  predict  some  value  that  is  different  from 
all  the  explicitly  predicted  values.  This  is  an  optimistic  assumption  and  tends 
to  overestimate  the  information  from  a  probe.  For  example,  suppose  that 
diagnoses  carrying  .5  of  the  weight  predict  that  a  particular  signal  will  be 
changing,  but  the  others  make  no  prediction.  The  information  .69  in  this  case 
would  be  computed  the  same  as  if  all  those  other  diagnoses  had  predicted  the 
signal  would  not  be  changing.  But  suppose  that  diagnoses  carrying  weight 
.33  predict  it  will  be  changing,  and  others  carrying  .33  predict  it  will  not. 
This  method  would  estimate  the  information  as  1.09,  although  there  are 
really  only  two  possible  outcomes  and  the  information  cannot  possibly  be 
more  than  1. 

Assume  that  the  other  diagnoses  predict  the  value  that  is  likeliest  among 
the  possible  values.  This  is  a  pessimistic  assumption,  tending  to  underesti¬ 
mate  the  information.  For  example,  if  diagnoses  carrying  .4  predict  the  signal 


7.4.  MAKING  OBSERVATIONS 


203 


is  changing  and  others  carrying  .3  predict  it  is  not,  the  result  is  computed 
as  if  the  distribution  had  been  .7  and  .3,  so  the  information  is  .61.  If  diag¬ 
noses  carrying  weight  .5  predict  a  signal  is  changing  and  the  rest  make  no 
prediction,  the  result  is  computed  as  if  all  diagnoses  had  predicted  it  would 
be  changing  too.  Thus  the  information  is  0. 

Assume  that  the  distribution  of  outcomes  among  the  remaining  diagnoses 
matches  the  distribution  among  the  explicitly  predicted  outcomes.  In  general 
this  provides  more  optimistic  estimates  than  a  method  in  which  all  possible 
outcomes  are  known,  but  an  overly  pessimistic  estimate  of  0  in  the  case  where 
only  one  outcome  has  been  explicitly  predicted. 

Assume  that  all  possible  outcomes  are  equally  likely,  and  distribute  the 
weight  among  them.  This  is  the  method  used  by  GDE.  Suppose  for  example 
that  there  are  four  possible  outcomes  a  through  d,  with  p(a)  =  .3,  p(b)  =  .2, 
p(c)  =  .1,  and  p(d)  =  0.  This  leaves  a  weight  of  .4,  and  this  method  yields  a 
distribution  of  p(a)  =  .4,  p(b)  =  .3,  p(c )  =  .2  and  p(d)  =  .1,  and  information 
of  1.28.  The  number  of  outcomes  can  be  treated  as  +oo  if  not  known. 

This  last  method  usually  makes  estimates  that '{all  between  those  of  the 
first  and  second  methods  above,  and  does  not  exhibit  the  anomalous  behavior 
of  the  third  when  only  one  outcome  has  been  explicitly  predicted.  It  has  other 
anomalies,  however.  Consider  a  signal  X  that  is  completely  disconnected  from 
the  current  set  of  candidates.  No  diagnosis  predicts  whether  it  is  changing  or 
not.  According  to  this  method,  probing  X  is  more  informative  than  probing 
a  signal  Y  that  two-thirds  of  the  diagnoses  predict  will  be  changing  and  that 
the  other  one-third  predict  will  not. 

XDE  uses  method  4  because  it  makes  reasonable  estimates  and  its  princi¬ 
pal  anomaly  is  easy  to  avoid:  signals  for  which  no  diagnosis  predicts  a  value 
are  never  probed.  The  values  XDE  computes  for  each  of  the  three  probes 
are  shown  below.  These  are  more  accurate  versions  of  the  approximate  val¬ 
ues  shown  earlier,  although  the  differences  are  very  small  and  the  relative 
rankings  in  this  case  have  not  changed. 
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Port 

Sum  over  — p,  log  pi 

(out  0  u25a) 

-(.200  +  .175  +  .0025)  log(.200  +  .175  +  .0025)  7777 

-(.623  +  .0025)  log(.623  +  .0025)  ” 

(out  y  u32a) 

-(.200  +  .088)  log(.200  +  .088) 

-(.623  +  .088)  log(.623  +  .088)  " 

(out  0  u30a) 

— 1  log  1  =0.0 

7.4.2  Temporal  Quantification  and  Granularity 

The  behavior  of  a  circuit  can  be  observed  at  various  times  and  at  temporal 
granularities,  with  varying  cost  in  setup  time  and  difficulty.  XDE  currently 
has  a  simple  and  limited  treatment  of  these  issues. 

Signals  must  be  observed  over  time  intervals.  Each  observation  in  XDE 
is  a  TINT  thru  predication  and  is  part  of  some  signal  history.  The  expected 
information  gain  bom  the  probing  of  any  signal  is  the  maximum  for  any 
interval  during  its  history.  Thus,  when  XDE  suggests  that  (say)  signal  (11  X) 
be  probed,  it  means  that  there  is  some  interval  of  its  history  during  which 
an  observation  would  be  useful.  XDE  presents  to  the  user  the  entire  signal 
history  of  (11  X)  and  abstractions  of  it  along  with  some  typical  misbehaviors 
(a  constant  1,  for  example).  The  actual  observations  made  of  the  device  will 
probably  correspond  to  one  of  the  intervals  already  presented;  if  not,  then 
an  interval  describing  the  observation  can  simply  be  typed  in.  For  example, 
XDE  may  expect  the  value  to  be  observed  at  a  certain  signal  to  be  either  10 
or  12,  and  so  presents  those  as  options;  if  the  actual  observation  was  13  that 
can  be  typed  in  too.  All  observations  are  assumed  to  be  completely  accurate 
in  terms  of  the  signal  values  observed  and  the  intervals  over  which  they  were 
seen. 

The  default  interval  over  which  signals  are  to  be  observed  is  denoted  by 
a  “global  reference”  timeline  denoted  by  the  pseudo-signal  GR.  The  assertion 
[thru  ?a  ?z  GR  t]  means  that  observations  are  made  by  default  with  re¬ 
spect  to  the  time  interval  ?a  to  ?z  inclusive.  The  interval  ?a  to  ?z  is  referred 
to  as  the  current  “observation  interval,”  which  is  automatically  changed  as 
the  user  adds  new  observations.  The  usual  default  is  the  ten  second  interval 
from  0  to  lO10  nsec  inclusive. 

In  a  real  troubleshooting  session,  the  circuit  board  continues  its  behav- 
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ior  while  the  troubleshooter  thinks  about  what  to  do  next,  and  each  new 
observation  is  made  at  a  later  time  than  the  last.  Since  it  would  be  un¬ 
wise  to  assume  that  the  circuit  is  not  changing  its  state,  the  troubleshooter 
ordinarily  forces  it  into  a  known  state  before  making  each  new  observation 
(by  pressing  a  “reset”  button,  for  example).  The  troubleshooter  ordinarily 
further  assumes  that  if  the  observations  of  the  circuit  are  made  more  than 
once,  the  same  results  will  be  obtained  each  time.  XDE  has  these  assump¬ 
tions  built  into  it.  For  example,  in  troubleshooting  the  Audio  Decoder  each 
new  observation  is  added  over  the  interval  from  0  to  1010,  rather  than  mak¬ 
ing  each  observation  come  after  the  previous  one.  Similarly,  in  the  Input 
Encoder  troubleshooting  example,  observations  are  added  over  the  intervals 
(-oo,+oo),  [1  x  10®,  2  x  10®],  [2  x  10®, +oo),  and  [0,  lO10],  in  that  order.  It  is 
assumed  that  each  new  observation  is  made  after  pressing  the  reset  button 
and  providing  identical  test  inputs  to  those  before,  so  that  the  same  behavior 
predictions  are  obtained. 

Observations  of  different  kinds  of  signals  at  different  locations  have  differ¬ 
ent  costs  in  setup  time.  Currently  XDE  only  allows  signals  to  be  observed  at 
the  external  ports  of  pins,  where  the  pin  meets  the  etch  (although  for  clarity 
most  of  the  examples  elsewhere  show  observations  being  added  at  the  clos¬ 
est  port  of  some  functional  component).  Observations  also  cannot  be  made 
over  intervals  shorter  than  one  second.  XDE  associates  a  numerical  cost  with 
each  possible  probe,  and  its  probe  suggestions  are  biased  to  favor  cheaper 
observations  by  multiplying  the  expected  information  of  each  probe  times  its 
cost.  The  costs  currently  used  are  as  follows;  they  are  estimates  based  on 
the  relative  ease  of  making  the  observation: 

•  Observing  whether  a  logic-level  signal  is  1  or  0  all  through  the  current 
observation  interval  costs  1.0.  This  is  the  most  basic  kind  of  observation 
and  involves  placing  a  single  probe. 

•  Observing  whether  a  logic-level  signal  is  changing  with  respect  to  the 
current  observation  interval  costs  0.9.  This  is  slightly  easier  than  view¬ 
ing  the  actual  value  of  the  signal. 

•  Observing  the  swing  of  a  voltage  with  respect  to  the  current  observation 
interval  costs  0.9.  Observing  the  amplitude  of  a  signal  is  judged  to  have 
about  the  same  difficulty  as  judging  whether  it  is  changing  or  not. 
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•  Observing  the  frequency  of  a  logic-level  or  voltage  signal  during  the 
current  observation  interval  costs  1.1,  since  it  may  require  adjusting 
the  temporal  resolution  of  the  oscilloscope. 

•  Observing  the  value  of  a  signal  sampled  with  respect  to  a  clock  costs 
2.0,  since  it  involves  setting  up  two  probes,  one  a  strobe  for  the  other. 

•  Observing  the  frequency  of  a  two-phase  clock  signal  costs  2.0,  since  it 
too  involves  setting  up  two  probes. 

Observing  the  outputs  of  the  Input  Encoder  cost  1.0  no  matter  where 
they  are  physically  located;  these  are  assumed  to  be  observable  through  other 
hardware  not  explicitly  represented.  The  brightness  of  the  console  screen,  for 
example,  is  an  indirect  way  to  observe  the  brightness  signal. 

7.5  Evaluation 

Testing  and  diagnostic  programs  are  usually  evaluated  by  their  coverage  (the 
range  of  faults  they  can  detect),  resolution  (the  accuracy  with  which  they 
can  identify  any  fault  actually  present)  and  speed  (as  measured  by  the  time  it 
takes  the  running  program  to  isolate  the  fault).  The  combined  troubleshoot¬ 
ing  system  of  XDE,  TINT,  and  BASIL  can  be  evaluated  this  way  too,  although 
it  is  important  to  distinguish  which  subsystem  is  responsible  for  the  quality 
achieved  along  each  dimension.  In  model- based  troubleshooting,  coverage, 
resolution,  and  speed  all  depend  critically  on  the  ability  to  detect  conflicts 
between  the  actual  behavior  of  the  device  and  its  predicted  behavior.  XDE 
cannot  do  anything  without  those  conflicts;  if  the  model  is  too  weak  to  pro¬ 
duce  predictions  that  are  falsiflable  by  observations,  then  XDE  will  ask  for 
many  observations  but  make  no  progress  toward  isolating  the  fault.  Thus 
the  importance  of  the  device  representation  far  outweighs  that  of  the  trou¬ 
bleshooting  engine. 

7.5.1  Coverage 

XDE  needs  to  discover  at  least  one  discrepancy  before  it  starts  generating 
diagnoses.  TINT,  therefore,  must  represent  enough  detail  about  the  behavior 
of  the  circuit  as  a  whole  to  detect  any  misbehavior  worth  repairing.  This  does 
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not  imply  that  every  misbehavior  of  every  individual  component  needs  to  be 
detectable,  although  that  is  one  way  to  guarantee  coverage.  For  example,  if 
the  specifications  of  the  Console  Controller  Board  say  that  the  screen  bright¬ 
ness  should  increase  in  response  to  the  “bn  command,  but  do  not  specify  how 
fast,  then  it  is  probably  okay  to  represent  that  rate  of  change  qualitatively 
instead  of  quantitatively.  Any  faults  whose  only  effect  would  be  to  slow  down 
the  rate  of  advance  would  not  be  detected.  The  coverage  provided  by  a  be¬ 
havior  model  is  thus  relative  to  the  desired  function  of  the  whole  device  and 
of  the  detail  of  the  observations. 

The  representation  of  the  Console  Controller  Board  in  TINT  is  an  in¬ 
complete  prototype  in  this  respect,  since  there  are  some  functions  of  the 
board  that  its  behavior  definitions  are  too  temporally  coarse  to  represent. 
For  example,  if  the  board  were  faulty  in  such  a  way  that  large  motions  of 
the  mouse  across  the  table  were  to  result  in  only  small  and  sporadic  motions 
on  the  screen,  this  would  surely  be  considered  a  misbehavior.  But  since  the 
TINT  signals  only  represent  the  motion  qualitatively  it  cannot  describe  the 
misbehavior.  A  rough  measure  of  the  coverage  that  the  representation  pro¬ 
vides  is  to  count  the  most  common  classes  of  faults,  and  determine  which  of 
them  result  in  misbehaviors  that  can  be  distinguished.  Among  the  most  com¬ 
mon  faults  axe  those  that  cause  individual  pins  to  act  as  open  circuits.  The 
Audio  Decoder,  for  example,  has  nine  chips  having  some  160  pins  between 
them.  Of  these  160,  failures  in  all  but  30  would  be  detectable  as  discrepancies 
in  the  swing,  frequency,  and  frequency  in  the  first  derivative  of  the  voltage 
output  of  the  digital-to-analog  converter.  Coverage  of  80%  of  the  common 
faults  from  only  these  three  features  of  the  output  voltage  is  not  bad,  and 
would  probably  be  improved  with  more  detailed  behavior  rules  for  the  shift 
registers  and  counters. 

♦ 

7.5.2  Resolution 

A  model-based  troubleshooting  program  provides  diagnostic  resolution  in 
proportion  to  the  structural  and  behavioral  detail  that  the  device  model 
provides.  The  program  cannot  of  course  distinguish  between  components 
that  are  not  represented  separately.  BASIL,  for  example,  represents  an  entire 
I  etch  as  a  single  component,  so  a  break  anyplace  in  the  etch  results  in  the  same 

diagnosis.  A  subtler  problem  is  that  even  failures  in  components  represented 
separately  cannot  be  distinguished  if  their  behavior  models  and  observations 
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are  insufficiently  detailed.  For  example,  Figure  7.8  shows  a  two-component 
device  whose  A  and  B  components  have  the  behaviors  A  and  B.  Suppose  that 
x  and  z  have  been  observed  and  a  discrepancy  detected  at  z.  (Aw,  Bw)  is  a 
conflict  and  the  diagnoses  are  [Aw]  and  [Bw]. 

Figure  7.8:  Distinguishing  Between  Diagnoses 


To  distinguish  between  these  diagnoses  requires  an  observation  at  y  —  but 
it  also  requires  that  either  the  observation  contradict  (A  x),  or  that  (B  y) 
contradict  the  observation  at  z.  It  might  do  neither.  The  observation  might 
be  too  coarse,  the  behaviors  might  be  partial,  or  both.  There  is  nothing 
wrong  with  the  troubleshooting  engine;  short  of  having  an  exhaustive  set  of 
syndromes  for  A  or  B,  there  is  nothing  it  can  do.  The  model  is  too  weak. 

The  temporally  abstract  models  of  the  Console  Controller  Board  cause 
problems  for  XDE  that  are  very  much  like  this  example.  In  principle  TINT  can 
represent  the  temporally  detailed  behavior  in  terms  of  logic-levels  0  and  1  for 
every  gate  in  a  circuit,  and  hence  in  principle  XDE  can  detect  misbehavior  in 
any  individual  component.  In  practice,  TINT  rules  only  cover  the  temporally 
coarse  behaviors  that  are  easy  to  observe.  For  example,  an  open  circuit  on 
a  control  input  of  the  shift  register  U21  might  result  in  all  its  parallel  data 
output  signals  changing,  although  in  seemingly  random  fashion  that  would 
show  up  on  the  digital-to-analog  converter  (U43)  output  as  such  (Figure  7.9). 

Even  assuming  that  every  visible  node  in  the  Audio  Decoder  were  probed 
to  see  whether  it  was  changing  or  not,  there  would  nevertheless  be  no  way 
to  distinguish  between  the  diagnoses  [U21otk«]  and  [U43otka].  The  outputs 
of  U21  are  not  represented  in  enough  temporal  detail  for  a  discrepancy  to  be 
detected  there.  More  generally,  among  the  130  detectable  common  faults  in 
the  Audio  Decoder,  about  half  of  them  are  distinguishable  down  to  a  single 
chip  and  the  remainder  result  in  this  kind  of  ambiguity.  Beyond  the  common 
faults,  it  is  probably  the  case  that  most  faults  internal  to  the  chips  would 
result  in  similar  lack  of  resolution.  The  temporal  detail  of  the  predictions  is 
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Figure  7.9:  Detail  of  Audio  Decoder 


sufficient  to  allow  many  correct  diagnoses  but  is  insufficient  to  achieve  perfect 
resolution,  even  with  exhaustive  probing.  Ultimately,  given  any  particular 
level  of  structural  detail,  if  perfect  resolution  is  desired  there  will  always  be 
cases  that  require  detailed  timing  information. 

t 

7.5.3  Speed 

An  appropriate  measure  of  the  speed  of  a  model-based  troubleshooting  pro¬ 
gram  is  the  number  and  cost  of  the  observations  it  requires  to  reach  its  final 
diagnosis.  This  is  a  meaningful  measure  so  long  as  the  device  model  provides 
enough  resolution  that  there  is  in  fact  such  a  thing  as  a  “final  diagnosis.”  If 
the  behavior  model  is  too  weak  or  the  observations  too  coarse  to  distinguish 
different  components,  XDE  eventually  quits  after  asking  for  all  possible  ob¬ 
servations.  The  speed  metric  in  that  case  is  hardly  meaningful.  Even  if  the 
model  and  observations  do  provide  sufficient  detail  to  discriminate  compo¬ 
nents  down  to  the  primitive  level  of  detail,  the  model  may  still  be  too  weak  to 
discover  genuine  conflicts  between  what  has  been  observed  and  what  should 
have  been.  In  that  case  more  observations  will  be  required  than  strictly  nec¬ 
essary.  The  probe  selection  strategy  used  by  XDE  has  a  number  of  heuristic 
aspects:  (i)  it  is  influenced  by  component  failure  rates  that  are  estimates, 
(ii)  it  estimates  the  benefits  of  probes  with  a  one-level  lookahead  rather  than 
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searching  through  mil  poMible  sequences  of  observations,  mud  (iii)  it  estimates 
information  from  probing  signals  whose  predicted  value  is  not  known  in  all 
diagnoses  by  assuming  that  all  observation  outcomes  are  equally  likely.  How- 
|  ever,  no  matter  how  good  these  heuristics  are,  in  the  long  run  their  positive 

impact  on  the  probes  actually  chosen  are  unlikely  to  be  nearly  as  strong  as 
the  negative  impact  of  a  device  model  that  cannot  make  full  use  of  the  ob¬ 
servations  actually  chosen.  The  cleverest  strategy  for  choosing  observations 
cannot  make  up  for  observations  and  models  that  are  too  coarse  to  detect 
discrepancies. 


7.6  Summary 

The  model-based  troubleshooting  engine  XDE  extends  GDE  by  incorporating 
hierarchic  diagnosis  and  fault  models.  Hierarchic  diagnosis  is  achieved  with 
the  decomposition  operation,  which  descends  one  level  at  a  time  through 
both  the  physical  and  functional  hierarchies  in  BASIL.  Knowledge  about 
how  components  fail,  represented  as  syndromes ,  is  used  in  the  refinement 
operation.  Syndromes  help  focus  the  troubleshooting  process  by  biasing  the 
suggestion  of  new  observations  away  from  components  unlikely  to  be  failing. 
XDE  can  suggest  observations  of  signals  at  various  temporal  resolutions,  and 
it  biases  its  suggestions  toward  those  that  are  cheaper. 

Like  all  model-based  troubleshooting  engines,  XDE  is  almost  totally  de¬ 
pendent  on  the  device  model  and  on  the  technology  for  observing  the  real 
device.  Obviously,  if  the  model  lacks  fidelity  its  diagnoses  may  be  incorrect. 
A  subtler  problem  is  that  if  the  model  is  imprecise  —  if  it  fails  to  produce  fal- 
sifiable  predictions  —  the  troubleshooting  engine  will  be  indiscriminate,  never 
reaching  a  conclusive  diagnosis  no  matter  how  many  observations  are  made. 
In  light  of  this  dependence,  any  evaluation  of  the  quality  of  the  diagnoses 
that  XDE  produces  is  really  an  evaluation  of  the  quality  of  the  underlying 
device  model. 


Chapter  8 

Conclusions  and  Future  Work 


Model-based  troubleshooting  has  not  previously  scaled  up  to  deal  with  com¬ 
plex  devices  such  as  digital  circuit  boards.  This  is  because  traditional  analytic 
models  of  complex  devices  do  not  explicitly  represent  aspects  of  the  device 
that  are  important  for  troubleshooting.  This  report  has  described  a  digital 
circuit  representation  that  was  constructed  with  troubleshooting  explicitly  in 
mind,  a  representation  that  enables  the  general  model-based  troubleshooting 
engine  XDE  to  successfully  diagnose  failures  in  circuits  that  are  much  more 
complex  than  any  previously  attempted.  This  representation  is  embodied  in 
the  language  BASIL  for  representing  the  physical  and  functional  organiza¬ 
tion  of  circuits  and  in  the  temporal  reasoning  system  TINT  for  representing 
circuit  behavior.  The  modeling  principles  that  underly  these  languages  and 
govern  their  use  concern  ways  in  which  features  of  the  circuit  relevant  to 
troubleshooting  can  be  made  explicit: 

•  Components  in  the  representation  of  the  physical  organization  of  the 
circuit  should  correspond  to  the  possible  repairs  of  the  actual  device. 

Making  the  elements  of  the  structure  representation  correspond  to  pos¬ 
sible  repair  actions  ensures  that  the  troubleshooting  program  will  not  waste 
effort  trying  to  discriminate  between  diagnoses  that  have  identical  repairs. 
BASIL  represents  circuits  using  a  strict  hierarchy  of  physical  components  that 
reflects  the  way  the  board  was  manufactured  and  hence  those  parts  that  can 
be  replaced. 

•  Components  in  the  representation  of  the  functional  organization  of  the 
circuit  should  facilitate  behavioral  abstraction. 


211 


212 


CHAPTER  8.  CONCLUSIONS  AND  FUTURE  WORK 


I 


The  only  role  that  in  explicit  representation  of  functional  organisation 
plays  in  model- based  troubleshooting  is  to  make  behavior  prediction  more 
efficient.  In  extracting  the  functional  organisation  from  a  raw  schematic  the 
modeler  need  only  represent  what  will  make  the  behavior  easier  to  reason 
with,  rather  than  necessarily  representing  what  the  designer  had  in  mind. 
BASIL  represents  this  functional  organisation  using  a  nonstrict  component 
hierarchy  whose  leaves  are  shared  with  the  physical  hierarchy.  XDE  does  hi¬ 
erarchic  diagnosis  using  the  physical  and  functional  hierarchies  by  descending 
primarily  through  the  physical  hierarchy,  while  reasoning  about  the  behav¬ 
ior  of  functional  components  roughly  corresponding  to  each  level  of  physical 
detail. 

•  The  behavior  of  components  should  be  represented  in  terms  of  features 
that  are  easy  for  the  troubleshooter  to  observe. 

Some  features  of  time-varying  signals  are  easier  to  observe  than  others. 
In  digital  circuits,  temporally  coarse  features  of  signals  are  easier  to  ob¬ 
serve  than  clock-cyde-by-clock-cyde  behavior.  TINT  provides  a  framework 
in  which  both  abstractions  and  behaviors  are  functions  from  signals  to  sig¬ 
nals,  along  with  a  vocabulary  of  temporal  abstractions  including  concepts 
such  as  change,  count,  and  frequency.  Expressing  the  behavior  of  compo¬ 
nents  in  these  terms  makes  prediction  more  effident  while  largdy  retaining 
the  ability  to  detect  the  effects  of  common  faults. 

•  The  behavior  of  a  component  for  which  changes  on  its  inputs  always 
results  in  changes  on  its  outputs  should  be  represented  in  temporally 
coarse  terms. 

Given  a  set  of  temporal  (or  any  other)  abstractions,  it  is  an  interesting 
and  rdevant  question  to  ask:  for  what  dass  of  behaviors  it  is  possible  to 
formulate  easily  computable  and  strong  abstract  behaviors?  More  spedfi- 
cally,  given  the  language  TINT  and  its  vocabulary  of  temporal  abstractions, 
for  which  components  is  it  worth  writing  temporally  abstract  behavior  rules 
for?  In  the  case  of  temporal  abstractions,  the  natural  class  of  rdevant  be¬ 
haviors  are  those  for  which  changes  on  inputs  always  result  in  changes  on 
outputs.  Combinational  behaviors  expressible  as  one-to-one  functions,  as 
well  as  toggles,  counters,  and  shift  registers,  fall  in  this  category. 
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•  A  temporally  coarse  behavior  description  that  only  covers  part  of  the 
behavior  of  a  component  is  better  than  not  covering  any  at  all. 

Although  the  fall  behavior  of  a  component  may  be  too  complex  to  reduce 
to  a  simple  relationship  between  (say)  the  number  of  changes  on  its  inputs 
and  the  number  of  changes  on  its  outputs,  there  may  be  a  useful  relationship 
that  involves  only  a  subset  of  its  inputs,  assuming  that  the  others  are  held 
constant.  Similarly,  there  may  be  a  useful  relationship  between  different 
signals  sampled  with  respect  to  a  common  clock.  TINT  rules  for  describing 
the  temporally  abstract  behaviors  of  components  ranging  from  boolean  gates 
to  microprocessors  capture  the  normal  behavior  of  those  components  using 
these  techniques. 

•  A  sequential  circuit  should  be  encapsulated  into  a  single  component  to 
enable  the  description  of  its  behavior  in  a  temporally  coarse  way. 

Although  the  individual  behaviors  of  the  components  in  a  sequential  cir¬ 
cuit  may  not  lend  themselves  to  temporally  coarse  descriptions,  the  group 
may  be  performing  a  simple  function  when  taken  as  a  whole.  Encapsulating 
the  group  of  components  makes  it  possible  to  apply  other  temporal  abstrac¬ 
tion  techniques  such  as  holding  inputs  constant.  In  many  troubleshooting 
situations,  it  will  be  unnecessary  to  ever  consider  the  individual  state  tran¬ 
sitions  of  its  sequential  behavior. 

•  An  explicit  representation  of  a  given  component  failure  mode  should 
be  used  if  the  underlying  failure  has  high  likelihood. 

Components  break  in  the  field  in  certain  ways  much  more  often  than 
in  other  ways.  XDE  takes  advantage  of  this  knowledge  by  extending  the 
multiple-faults  approach  of  GDE  [deKleer87]  to  use  fault  models.  The  notion 
of  a  syndrome  in  BASIL  and  TINT  captures  knowledge  about  the  likelihood, 
physical  causes,  and  local  behavioral  effects  of  failures.  Syndromes  are  ben¬ 
eficial  when  they  are  inconsistent  with  the  symptoms,  since  this  can  reduce 
the  ambiguity  among  the  possible  diagnoses.  A  syndrome  with  relatively 
high  likelihood  is  valuable  because  it  can  be  used  to  virtually  eliminate  an 
otherwise  logically  possible  diagnosis. 
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•  An  explicit  representation  of  a  given  component  failure  mode  should  be 
used  if  the  resulting  misbehavior  is  drastically  simpler  than  the  normal 
behavior  of  the  component. 

If  a  component  with  normally  complex  behavior  has  some  potential  in¬ 
ternal  fault  or  faults  that  cause  it  to  misbehave  catastrophically,  then  any 
partially  correct  behavior  observed  for  the  component  makes  it  a  less  likely 
suspect.  Syndromes  that  simplify  the  behavior  of  a  component  are  useful  be¬ 
cause  their  effects  on  the  rest  of  the  device  are  relatively  efficient  to  predict. 

The  power  of  these  eight  principles  has  been  demonstrated  in  an  imple¬ 
mented  program  that  can  troubleshoot  problems  in  a  board-scale  circuit,  the 
Symbolics  3600  Console  Controller  Board.  Testing  the  system  on  a  wider 
set  of  cases  using  the  same  board  and  modeling  yet  other  boards  are  among 
the  important  follow-up  work  that  needs  to  be  done.  The  following  sec¬ 
tions  discuss  three  avenues  of  further  research  that  should  be  performed:  (i) 
improving  the  engineering  of  the  program,  (ii)  deriving  the  specialized  trou¬ 
bleshooting  representation  from  more  primitive  circuit  descriptions,  and  (iii) 
generalizing  the  methodology  beyond  the  domain  of  digital  circuits. 


8.1  Engineering  Issues 

The  current  implementations  of  XDE,  BASEL,  and  TINT  are  demonstration 
vehicles  and  are  too  slow  to  contemplate  putting  to  serious  use  in  trou¬ 
bleshooting.  Since  their  implementation  details  have  not  been  presented  in 
this  report  it  would  be  inappropriate  to  discuss  in  detail  how  those  imple¬ 
mentations  could  be  improved,  nevertheless  it  is  worth  mentioning  certain 
broad  areas  needing  improvement. 

Circuit  structures  are  described  in  BASIL  using  the  predicates  isa,  ako, 
has -port,  conn,  status-of,  and  cor r,  and  assertions  using  these  predicates 
are  stored  in  the  most  naive  and  general  fashion  as  patterns  in  a  discrimina¬ 
tion  net.  A  better  implementation  would  store  the  components  and  connec¬ 
tions  as  frame  instances  [Minsky75]  [Batali81]  [Davis83].  JOSHUA  provides  a 
substrate  for  doing  so  [Rowley87],  but  the  conversion  has  not  yet  been  done. 
Moreover,  while  building  the  BASIL  description  of  the  Console  Controller 
Board  the  lack  of  graphical  display  and  editing  facilities  was  keenly  felt; 
reimplementing  BASIL  using  an  existing  design  and  layout  language  might 
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be  no  more  difficult  than  using  any  arbitrary  frame  language  while  yielding 
considerably  more  utility. 

TINT  is  slow  in  spite  of  its  simplicity.  As  with  BASIL,  assertions  involv¬ 
ing  the  predicates  thru  and  tsams  are  implemented  (primarily)  as  patterns 
in  a  discrimination  net,  and  this  generality  is  costly.  A  deeper  problem  is 
that  during  the  prediction  process  the  forward  chaining  from  these  assertions 
makes  many  deductions  about  time  intervals  that  later  turn  out  to  be  shad¬ 
owed.  This  problem  might  be  alleviated  if  assertions  were  made  about  time 
intervals  with  relative  endpoints  instead  of  fixed  integers.  The  predecessor 
to  TINT  was  a  temporal  constraint  propagator  MINT  that  used  inequalities 
over  time  points  in  that  fashion.  Goal-directed  reasoning  about  these  in¬ 
equality  constraints  was  integrated  with  the  forward  chaining  from  intervals 
of  signal  histories.  However,  the  effort  to  give  this  more  complex  program 
adequate  performance  turned  into  a  major  research  agenda  all  its  own  and 
was  suspended  in  favor  of  the  simpler  TINT  ontology.  The  next  incarna¬ 
tion  of  the  temporal  reasoning  subsystem  will  probably  not  be  a  data-driven 
constraint  propagator  at  all,  but  a  goal  driven  system  that  produces  more 
limited  predictions. 

The  hybrid  TMS  used  in  the  program  is  basically  a  single-context  TMS  to 
which  the  propagation  of  environments  and  labels  has  been  added.  Labeling 
each  assertion  with  its  minimal  environments  is  very  useful  for  probe  selec¬ 
tion,  implying  the  need  for  an  Assumption-based  TMS  architecture  (ATMS). 
On  the  other  hand  there  are  at  least  two  reasons  for  doing  explicit  context 
switching:  (i)  TINT  requires  that  some  assertions  be  “shadowed”  to  prevent 
rules  from  firing  on  them,  an  effect  that  seems  difficult  to  produce  in  an 
ATMS,  and  (ii)  unrestricted  rule  firing  in  environments  containing  several 
syndromes  would  be  wasteful  since  the  relative  likelihood  of  those  environ¬ 
ments  is  usually  very  small.  An  ATMS  that  provided  shadowing  and  efficient 
incremental  updating  of  environment  likelihoods  (so  as  to  support  best-first 
search  among  diagnoses)  would  be  a  good  replacement  for  the  current  hybrid 
TMS. 

XDE  currently  tries  to  use  fault  models,  then  tries  to  do  hierarchic  diag¬ 
nosis,  and  when  those  fail  it  selects  probes.  A  better  strategy  would  make 
use  of  the  number  of  outstanding  diagnoses  and  ambiguity  among  the  di¬ 
agnoses  to  choose  the  next  operation.  For  example,  when  there  are  many 
diagnoses,  getting  new  observations  is  probably  preferable  to  doing  decom¬ 
positions.  Experiments  with  a  strategy  based  on  the  entropy  of  the  current 
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set  of  diagnoses  did  not  yield  significant  improvement,  bnt  better  control  is 
clearly  necessary.  The  decomposition  operation  in  particular  is  invoked  far 
too  casually,  resulting  in  many  useless  predictions  being  made. 

Finally,  XDE  spends  a  surprising  amount  of  time  finding  optimal  probes1. 
The  basic  reason  is  that  the  amount  of  work  involved  each  time  a  probe  is 
chosen  is  proportional  to  the  number  of  possible  probes  times  the  number 
of  candidates.  However,  since  the  only  interesting  probe  is  the  one  having 
minimum  entropy,  there  ought  to  be  a  way  of  sorting  the  possibilities  so  that 
not  every  candidate  or  probe  needs  to  be  examined  every  time.  Also,  not 
every  candidate  likelihood  changes  between  observations,  so  there  should  be 
some  way  to  cache  parts  of  the  computation  from  one  probe  selection  to  the 
next. 


8.2  Deriving  the  Representation 

The  fact  that  the  behavior  rules  for  the  components  are  currently  all  hand¬ 
crafted  is  a  cause  for  concern.  In  the  short  term,  a  library  of  signal  definitions, 
behavior  constraints,  and  syndromes  has  been  accumulated  to  speed  up  the 
description  of  future  circuits.  However,  the  whole  process  needs  to  be  au¬ 
tomated:  the  troubleshooting  program  should  be  able  to  diagnose  a  circuit 
starting  only  from  a  primitive  representation  of  structure  and  part  specifica¬ 
tions  along  with  whatever  design  specifications  and  annotations  happen  to  be 
available  for  its  various  modules.  Presumably  this  would  be  done  by  building 
and  using  an  intermediate  representation  of  its  structure  and  behavior  along 
the  lines  described  in  this  report. 

Extracting  an  appropriately  abstracted  behavior  representation  from  an 
underlying  physical  structure  is  an  exceptionally  difficult  problem.  Since 
the  appropriate  abstractions  to  be  used  for  describing  circuit  behavior  are 
bound  to  capture  some  of  the  intended  function  of  the  circuit,  there  are  close 
connections  between  this  and  the  function-from-form  problem.  [deKleer78] 
presents  as  a  key  insight  a  teleological  constraint:  the  correct  interpretation 
of  the  function  of  a  designed  artifact  must  assign  some  role  to  every  structural 
element.  A  latent  flaw  in  that  particular  approach  was  that  the  target  rep¬ 
resentation  of  circuit  function  seemed  to  exist  in  a  vacuum,  having  no  role  in 
any  problem  solver.  In  FUNSTRUX  [Hall87],  by  contrast,  simulation  models 

1So  does  GDE  (B.  Williams,  personal  communication). 


8.3 .  GENERALIZING  THE  METHODOLOGY 


217 


of  digital  circuit  elements  are  symbolically  composed  into  simulation  mod¬ 
els  for  aggregate  structures;  tbe  compositions  and  subsequent  simplifications 
are  strongly  guided  by  tbe  goal  of  producing  efficient  simulation  models  for  a 
specific  event  driven  simulator.  In  tbe  present  case,  tbe  target  problem  solver 
is  XDE  and  so  tbe  desirable  characteristics  of  tbe  target  representation  are 
clear.  Tbis  should  provide  a  strong  constraint  on  tbe  relevant  abstractions. 
Tbe  FUNSTRUX  approach  might  also  work  under  tbe  somewhat  different 
goal  of  producing  temporally  abstract  behaviors.  Finally,  since  tbe  image 
of  an  ordinary  digital  model  under  temporal  abstractions  can  be  viewed  as 
a  reformulation  into  a  specialized  representation,  tbe  frameworks  outlined 
in  [Kramer87]  or  [VanBaalen88]  might  be  useful  ways  of  approaching  the 
problem. 

An  alternative  approach  would  be  to  start  doing  prediction  at  low  levels 
of  detail,  but  recognize  recurring  patterns  of  events  and  extrapolate  their 
cumulative  effects  over  large  stretches  of  time.  This  is  the  essence  of  the 
aggregation  technique  [Wdd86].  There  are  at  least  two  difficulties  with  this 
approach:  (i)  recognizing  what  constitutes  a  “recurring”  pattern  of  events, 
that  is,  deciding  which  events  are  relevant  to  a  given  pattern,  and  (ii)  ensuring 
that  the  extrapolated  predictions  are  robust  against  fencepost  errors.  In 
spite  of  these  difficulties  it  bears  further  investigation  because  it  has  strong 
intuitive  appeal  —  people  seem  to  be  good  at  detecting  repetitive  sequences 
and  extrapolating  them  to  find  their  limits.  At  the  very  least,  aggregation 
should  be  a  useful  technique  for  generating  fault  syndromes  from  ordinary 
fault  simulations. 


8.3  Generalizing  the  Methodology- 

Digital  circuit  troubleshooting  is  a  relatively  narrow  domain.  To  learn  more 

I  about  model- based  troubleshooting  of  complex  systems  in  general  it  is  impor¬ 

tant  to  apply  the  technology  to  systems  in  a  variety  of  domains.  The  eight 
principles  of  modeling  for  troubleshooting  that  form  the  core  of  this  work 
are  briefly  discussed  below  in  the  context  of  local  area  computer  networks, 
automobile  engines,  and  internal  medicine. 

I  The  eight  principles  can  be  used  to  suggest  characteristics  of  a  represen¬ 

tation  for  troubleshooting  computer  networks.  First,  there  is  a  superficially 
appealing  analogy  to  be  drawn  between  the  structure  of  computer  boards 
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and  the  structure  of  networks.  Instead  of  chips  connected  by  wires,  there 
are  hosts  connected  by  cables,  and  so  forth.  However,  one  of  the  principles 
dictates  that  the  elements  of  the  structure  should  correspond  to  failures  and 
repairs.  On  closer  investigation  of  the  domain  it  turns  out  that  failure*  and 
repairs  in  networks  only  rarely  have  a  physical  cause  such  as  a  broken  cable; 
the  most  typical  failures  are  crashed  server  processes  or  operating  systems, 
for  which  the  repair  involves  a  restart  operation.  This  indicates  that,  for 
example,  components  such  as  hosts  are  not  appropriate  physical  primitives, 
but  that  in  some  sense  server  processes  are.  Second,  tests  are  usually  de¬ 
signed  into  the  system  and  can  be  quite  cheap.  Some  networks,  for  example, 
provide  an  operation  that  allows  one  host  to  request  a  status  response  from 
every  host  on  its  subnetwork.  To  model  the  features  of  component  behaviors 
that  are  easiest  to  observe  means  that  for  the  most  part  only  the  behavior  of 
the  hosts  with  respect  to  these  test  operations  needs  to  be  modeled.  Finally, 
there  are  other  network  misbehaviors  for  which  the  principal  symptoms  are 
temporally  coarse  —  mail  servers,  for  example,  are  notorious  for  building 
up  enormous  queues  that  ultimately  result  in  slowed  response  times.  This 
suggests  that  appropriate  behavior  models  will  deal  not  with  the  movements 
of  individual  packets,  but  rather  with  temporally  abstract  features  such  as 
the  number  of  packets  and  their  rates  of  transmission.  All  of  the  principles 
for  constructing  temporally  abstract  behaviors  appear  to  apply  equally  well 
to  network  events  as  to  digital  events. 

There  are  few  obvious  analogies  between  automobile  engines  and  digital 
circuits,  but  from  the  special  perspective  of  troubleshooting  and  the  prin¬ 
ciples  of  modeling  for  troubleshooting  engines  have  important  similarities 
to  circuits.  First,  they  are  manufactured  artifacts  that  are  repaired  by  re¬ 
placement  of  their  physical  parts.  Second,  the  easily  observed  features  of 
the  engine  behavior  are  temporally  coarse  compared  to  events  such  as  piston 
firings  and  crankshaft  rotations.  Some  of  these  temporally  coarse  features  de¬ 
scribe  the  behavior  of  subsystems  with  sequential  feedback:  for  example,  the 
distributor,  pistons,  crankshaft,  and  generator  form  a  feedback  loop  whose 
interesting  properties  are  its  revolutions  per  minute,  sputters,  vibrations, 
and  so  forth.  Third,  there  are  many  failure  modes  worth  modeling  explic¬ 
itly  either  because  they  are  common  or  because  they  drastically  simplify  the 
behavior  of  the  engine:  empty  gas  tanks,  dead  batteries,  disconnected  wires, 
and  so  forth.  While  it  would  surely  be  a  major  task  to  construct  a  full-blown 
model  of  an  internal  combustion  engine,  the  eight  principles  do  suggest  which 
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properties  of  engines  will  be  most  important  to  include  in  a  model  to  be  used 
for  troubleshooting. 

The  methodology  and  principles  in  this  work  are  most  appropriate  for 
troubleshooting  designed  artifacts.  An  implicit  assumption  has  been  that 
the  modeler  could  in  principle  provide  an  arbitrarily  detailed  account  of  the 
behavior  of  the  system,  while  the  modeling  challenge  is  <.  make  do  with  the 
least  detailed  description  that  still  works.  This  assumption  does  not  apply 
to  human  physiology  and  medicine;  the  challenge  in  these  domains  is  to  pro¬ 
duce  any  model.  From  the  perspective  of  this  work  there  are  other  important 
differences  as  well.  First,  it  is  inappropriate  to  emphasize  the  representation 
of  physical  structure.  In  medicine  it  is  relatively  rare  that  therapy  consists 
of  physically  isolated  structural  repairs  (organ  transplants  notwithstanding). 
Second,  in  medicine  easily-observed  symptoms  are  uncorrelated  with  their 
temporal  extent.  While  it  may  be  important  to  explicitly  model  what  can  be 
observed  it  generally  has  nothing  to  do  with  temporal  abstractions.  Third, 
the  criteria  used  to  decide  which  circuit  fault  models  to  include  are  at  best 
incomplete  for  human  diseases.  For  example,  the  short  and  long  term  se¬ 
riousness  of  the  diseases  should  somehow  be  taken  into  account.  A  few  of 
the  principles  of  modeling  for  troubleshooting  might  apply  to  subdomains  of 
medicine  for  which  good  analytic  models  exist,  but  only  tangentially.  For 
example,  in  the  multilevel  physiological  model  in  ABEL  [Patil81],  one  of  the 
simplifications  that  distinguishes  the  abstract  levels  from  the  detailed  ones 
is  that  feedback  loops  are  composed  and  summarized.  In  general,  however, 
there  is  as  yet  no  compelling  evidence  that  the  principles  of  modeling  for 
troubleshooting  will  apply  to  modeling  physiology  for  diagnosis. 
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Appendix  A 

Scenario  Transcripts 


The  transcripts  in  Appendices  A.l  through  A. 11  have  three  kinds  of  entries: 

•  “Thera  are  n  diagnoses...”  indicates  the  status  of  the  troubleshoot¬ 
ing  engine  after  each  change  to  its  set  of  diagnoses.  The  current  top 
diagnosis  is  shown  with  it. 

e  “Adding  observation...”  means  that  a  new  TINT  assertion  about  the 
value  of  some  observable  signal  is  being  added. 

e  “Entropy  Signal;  Aliases...”  shows  the  top  ranked  probe  — 
its  entropy,  the  internal  name  of  the  signal,  two  other  nearby  named 
ports  to  help  provide  context,  and  finally  the  list  of  values  predicted 
there  along  with  their  labels. 

To  produce  the  transcripts,  the  troubleshooting  engine  consulted  an  or¬ 
acle  to  get  the  result  of  its  highest  ranked  probe,  just  as  if  a  human  user 
had  typed  in  the  same  result.  Some  of  the  transcripts  have  a  histogram 
at  the  end  that  summarizes  the  sequence  of  probes  made.  The  length  of 
each  horizontal  bar  corresponds  to  the  number  of  competing  diagnoses.  The 
bracketed  timestamps  [hhimmtss]  give  a  rough  idea  of  the  performance  of 
the  troubleshooting  program  running  on  a  Symbolics  3650  with  2  Mw;  16 
minutes  for  one  of  the  Audio  Decoder  examples  is  typical. 
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A.l  Clock  Generator  Example 


There  iz*  1  dltpoiM  (entropy  0.000)  accounting  for  .00: 

1.000  [[]] 

[18:03:19]  Adding  observation  of  1  at 
[LL  [H0LZ  1  1167]] 

Conflict)  Thar*  aro  3  diagnoses  (entropy  1.303)  accounting  for  .00: 
0.630  [[(U36  Other)]] 

There  are  3  diagnoses  (entropy  1.303)  accounting  for  .90: 

0.630  [[(036  Other)]] 

Befitting  036  with  0PIX 

There  are  3  diagnoses  (entropy  1.306)  accounting  for  .96: 

0.636  [[(036  Open)]] 

Decomposing  ( #< ASSUMPTION  +I3F  [STATUS-0F  036  W0BKII8]>) 

There  are  3  diagnoses  (entropy  1.306)  accounting  for  .06: 

0.636  [[(036  Open)]] 

Decomposing  («<ASS0MPTI0I  +IIF  [STATUS-0F  030  V0UCXIG]>) 

There  are  3  diagnoses  (entropy  1.306)  accounting  for  .06: 

0.636  [[(036  Open)]] 

Decomposing  (#<ASSUMPTI0*  +IIF  [STAT0S-0F  033  V0BXZIQ]>) 

There  are  3  diagnoses  (entropy  1.306)  accounting  for  .96: 

0.636  [[(036  Open)]] 


•  •  • 

Entropy  Signal;  Aliases;  Talus- Environment  Fairs 

0.0431  [CHAI0 IVQ-HRT  1000000000  10000000000  [U  [BOLE  1  V391]]] 
aka  [PXI  9  036]  aka  [OUT  0  U36A] 

((HI  *<EIT  3  06>  #<EJTT  1  010>)  (T  #<EIT  1  01>)) 

[10:06:36]  Adding  observation  of  T  at 
[CHAIOIVO-VET  1000000000  10000000000  [LL  [BOLE  1  1391]]] 

There  are  3  diagnoses  (entropy  0.097)  accounting  for  .06: 

0.633  [[(U30  Other)]] 


6 
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Entropy  Signal;  Aliases;  Talus-Environment  Pairs 

0.0068  [CBAI0II6-VRT  1000000000  10000000000  [LL  [BOLE  1  1306]]] 
aka  [PXI  3  033]  aka  [OUT  T  U33A] 

((T  «<EIT  1  03>)  (IXL  #<EIT  1  04>)) 
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[19:06:06]  Adding  observation  of  YXL  a* 

CCHAI61M-VKT  1000000000  10000000000  [LL  [BO LX  1  1206]]] 

e  •  • 

Thar*  ars  1  ditoom  (entropy  0.000)  accounting  for  .96: 
1.000  [C(US2  Other)]] 
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•  •  • 

There  axe  1  ditpoiti  (entropy  0.000)  accounting  for  .OS: 

1.000  [□] 

[14:00:13]  Adding  observation  of  0  at 
[HAX-HXI-W  100000000  [VOLTAfiS  [H0LI  1  1273]]] 

»  •  o 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Oeconposing  (#< ASSUMPTION  +IIF  [STATOS-OF  043  W0*KZI6]>) 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Oeconposing  (#< ASSUMPTION  +IIF  [STATOS-OF  012  WUZS0]>) 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Decomposing  (#< ASSUMPTION  +INF  [STATOS-OF  044  U0AXIN0]>) 

see 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Oeconposing  (*<ASSOHPTIOI  +INF  [STATOS-OF  044  WSXIIG]>) 

•  •  • 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Docoaposing  («<ASSOMPTIOI  +INF  [STATOS-OF  044  H0UIia]>) 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .05: 
0.103  [[(043  Other)]] 

Decomposing  (#< ASSUMPTION  +IIF  [STATOS-OF  021  V0KKIt6]>) 

There  are  10  diagnoses  (entropy  3.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 


Entropy  Signal;  Aliases;  Talue-Snvironnent  Pairs 

0.0201  [CHAVaiVO-HKT  0  10000000000  [LL  [HOLS  2  1200]]] 
aka  [Pll  10  043]  aka  [II  CS  043A] 

((T  #<8 NT  0  01307>)) 

[14:10:00]  Adding  observation  of  1  at 
[LL  [HOLS  2  1200]] 

Conflict!  There  are  0  diagnoses  (entropy  2.566)  accounting  for  .06: 
0.222  [[(012  Other)]] 


224 


APPENDIX  A.  SCENARIO  TRANSCRIPTS 


Hurt  ar*  6  ditptiti  ( entropy  2.288)  accounting  for  .08: 
0.268  [[(012  Other)]] 

D*coapo«ing  (#<ASSU*PTI0I  +IIF  [STATUS-OF  010  V0tKXI0]>) 

*  •  • 

Thar*  ax*  8  dlaouiM  (entropy  2.288)  accounting  for  .96: 
0.268  [[(012  Other)]] 

feeouposiag  (6<ASS0Nm08  +IIF  [STATUS-OF  Oil  H0AXIIS]>) 

Thor*  aro  6  diapoiti  (entropy  2.288)  accounting  for  .06: 
0.263  [[(012  Other)]] 

D#eoapoaing  (0<ASSOBPTXOI  +IIF  [STATUS-OF  020  W0EEXI8]>) 

Thor#  ar*  6  diagnoa#a  (ontropy  2.288)  accounting  for  .06: 
0.268  [[(012  Other)]] 


•  •  • 

Entropy  Signal;  Aliaaaa;  Tain*- Environment  Paira 

0.7167  [CHAIGIIG-U1T  0  10000000000  [LL  [BOLE  1  188]]] 
aka  [PIE  14  021]  aka  [IX  8  021A] 

((T  0<EBV  8  0S20>)) 

[14:12:08]  Adding  obaortation  of  1  at 
[LL  [HOLE  1  888J] 

•  •  • 

Thera  ar#  2  diagnoa##  ( ontropy  0.021)  aaoannting  fer  .96: 
0.668  [[(012  Other)]] 


•  •  • 

Entropy  Signal;  Aliaaaa;  Taloa-Enriroanent  Paira 

0.9164  [CHAIGII6-UT  0  10000000000  [LL  [BOLE  4  166]]] 
aka  [PXI  2  Oil]  aka  [XI  CLX  0111] 

((T  0<UT  1  0200>)) 

[14:12:43]  Adding  obaarration  of  1  at 
[LL  [BOLE  4  166]] 

Th#ro  ar#  1  diagnoa#a  (entropy  0.000)  accounting  for  .06: 
1.000  [[(012  Other)]] 

Proboa  Diagnoa#a 
(Four)  aft#rwarda 
■272  #<#0*####6  10 
1290  00000  6 
188  00  2 
166  0  1 
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A.3  Audio  Decoder  Example  I  with  Syn 
dromes 


Thar*  are  1  diagnoaea  (entropy  0.000)  accounting  to r  .96: 

1.000  [□] 

[14:26:10]  Adding  obeerrmtion  ot  0  at 
[MAX-MZ1-W  100000000  [VOLT ACS  [B0LI  1  1272]]] 

Out*  are  10  diagnoaea  (entropy  3.269)  accounting  for  .96: 

0.169  [[(U43  Other)]] 

defining  012  with  IIACTIVS 

Conflict  I  There  are  10  diagnoaea  (entropy  3.203)  accounting  for  .96: 
0.163  [[(043  Other)]] 

There  are  10  diagnoaea  (entropy  3.203)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Refining  Oil  with  CSB-UACTXVX 

Conflict!  There  are  11  diagnoaea  (entropy  3.320)  accounting  for  .96: 
0.163  [[(043  Other)]] 

There  are  11  diagnoaea  (entropy  3.320)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Refining  010  with  CSB-ZSACTIT1 

e  e  • 

Conflict!  There  are  11  diagnoaea  (entropy  3.282)  accounting  for  .96: 
0.163  [[(043  Other)]] 

e  e  e 

There  are  11  diagnoaea  (entropy  3.282)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Deconpoaing  (#< ASSUMPTION  +IIP  [STATUS-OP  043  V0RXXI6]>) 

There  are  11  diagnoaea  (entropy  3.282)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Deconpoaing  (*<ASSUMPT10I  +IIP  [STATUS-OP  021  V0RXZIC]>) 

There  are  11  diagnoaea  (entropy  3.282)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decoapoaing  (*<ASSUMPT10I  +IIP  [STATUS-OP  U21  V0RKII6]>) 

There  are  11  diagnoaea  (entropy  3.282)  accounting  for  .96: 

0.163  [[(U43  Other)]] 

Deconpoaing  (*<ASSUMPTXOI  +I*P  [STATUS-OP  U21  V0RXIV6]>) 
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Thar*  arc  11  diigaoiM  (entropy  3.282)  accounting  for  .98: 
0.163  [[(043  Other)]] 

Decomposing  (#<iSSOKPTXOB  +XIF  [STATOS-OF  044  H0UZM]>) 

a  •  • 

There  are  11  diagnoses  (entropy  3.282)  accounting  for  .96: 
0.163  [[(043  Other)]] 


Entropy  Signal;  Aliases;  Talus- Environment  Pair* 

0.8293  [CHAI6XIQ-HET  0  10000000000  CLL  [HOLE  2  1290]]] 
aka  CPU  10  043]  aka  [II  CS  043 A] 

((T  8<EET  6  01307>)) 

[14:30:37]  Adding  observation  of  1  at 
[LL  CHOU  2  1290]] 

Conflict!  There  are  8  diagnose*  (entropy  2.864)  accounting  for  .96: 
0.166  CC(012  Other)]] 

•  •  e 

Conflict!  There  are  6  diagnose*  (entropy  2.634)  accounting  for  .96: 
0.203  CC(012  Other)]] 

•  •  • 

There  are  6  diagnoses  (entropy  2.684)  accounting  for  .96: 

0.203  [[(012  Other)]] 

Hefining  020  with  CSB-XIACTXTS 

e  e  e 

Conflict!  There  are  8  diagnoses  (entropy  2.632)  accounting  for  .96: 
0.218  [[(012  Other)]] 

see 

There  are  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [ [(012  Other)]] 

Decomposing  (8<ASSOHPTXOI  +I1F  [STATUS-OF  012  V0tXXIS]>) 

There  are  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Decomposing  (#<ASSOMPTIOI  +IIF  [STATOS-OF  010  W0tXZI6]>) 

There  are  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Decomposing  (#<ASS0HPTZ0I  +IIF  [STATOS-OF  Oil  V0tXZI6]>) 

There  are  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Decomposing  (ttiSSOKPTXOV  +ZIF  [STATOS-OF  020  H0EKZI0]>) 

There  are  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 
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Cntropy  Signal;  Alias**;  Vain*- Environment  Pair* 

1.0000  [U  [BOU  1  1213]] 

aka  CPU  IS  010]  aka  COOT  TC  010A] 

((0  KUT  1  04>  t<UT  3  04102>  #<ETT  A  020102>)) 

[14:34:13]  Adding  observation  of  0  at 
CU.  CHOU  1  1213]] 

Tb*r*  ax*  0  diagnose*  (entropy  2.S32)  accounting  for  .OS: 
0.210  CC(012  Other)]] 


Entropy  Signal;  Aliases;  Value-Environment  Pairs 

1.0000  cu  Chou  i  iso]] 

aka  CPU  10  012]  aka  COOT  Z  012A] 

((0  *<SIV  2  04100>  *<XIT  2  010100>  0<KIT  2  020100)) 

[14:36:12]  Adding  observation  of  1  at 
CU  [HOLE  1  ISO]] 

There  are  2  diagnoses  (entropy  0.001)  accounting  for  .06: 
0.701  [[(012  Other)]] 


Entropy  Signal;  Aliases;  Valoe-Eavironaeat  Pairs 

Probes  Diagnoses 
(Four)  afterwards 
1272  #########00  11 
1200  000000  0 
1213  000000  0 
160  00  2 
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A.4  Audio  Decoder  Example  II 


There  are  1  diagnoses  (entropy  0.000)  accounting  for  .OS: 

i.ooo  cm 

[16:00:47]  observation  of  0  at 

[JUX-NXI-W  100000000  [V0LTA6I  [B0 LX  1  1272]]] 

Thor*  ar*  10  diagnoses  (*atropy  9.200)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Decomposing  (tKASSUWTIOI  +117  [STAT0S-OF  043  H0XKZX6]>) 

There  ar*  10  diagnoses  (entropy  9.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Decomposing  (*CiSSOKPTX0I  +117  [STAT0S-07  012  U0UIIG]>) 

There  ar*  10  diagnoses  (entropy  3.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Decomposing  (0<ASS0MPTZOI  +117  [STATUS-07  044  U0KXIIC]>) 

e  *  * 

There  ar*  10  diagnoses  (entropy  3.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Decomposing  (tKASSUKPTIOI  +1X7  [STATUS-07  044  WIXZB8]>) 

e  e  e 

There  ar*  10  diagnoses  (entropy  3.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Decomposing  (tcASSUUPTIOf  +117  [STATUS -07  044  W0XKIXS]>) 

Thor*  ar*  10  diagnoses  (entropy  3.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Decomposing  (#<ASSUKPTI0I  +1X7  [STATUS-07  021  V0XXXXS]>) 

There  ar*  10  diagnoses  (entropy  3.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 


Xatropy  Signal;  Aliases;  Value- environment  Fairs 

0.0201  [CHAXQZIS-VXT  0  10000000000  [LX.  [B0 LX  2  X200]]] 
aka  [7ZX  10  043]  aka  [ZX  CS  D43A] 

((T  0<XXT  6  01307>)) 

[16:12:34]  Adding  observation  of  1  at 
[LL  [B0  LX  2  1200]] 

Conflict!  There  ar*  6  diagnoses  (entropy  2.666)  accounting  for  .06: 
0.222  [[(012  Other)]] 
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Thoro  ara  6  diagnoaaa  (ant ropy  2.299)  accounting  for  .96: 
0.293  [[(912  Othor)]] 

Dacoapoaing  (*<ASS9XPTX0I  +117  [STATUS -OF  910  WCWCI10]>) 

Thoro  nr#  6  diagnoaoa  (ant ropy  2.299)  accounting  for  .96: 
0.293  [[(912  Othar))] 

Daooapoaing  (*<13S9MPTZ0I  +Z17  [STAT9S-07  911  V0tKZI0l>) 

Thor#  nr#  6  diagnoaoa  (ant ropy  2.299)  accounting  for  .96: 
0.293  [[(912  Othor)]] 

Daooapoaing  (#<ASS9N7TZ0I  +Z17  [STATUS -OF  920  90WCZ*8]>) 

Thoro  ara  6  diagnoaoa  (ant ropy  2.299)  accounting  for  .96: 
0.293  [[(912  Othor)]] 


a  a  a 

Entropy  Signal;  Aliaaoa;  Value- Environment  Pair a 

0.7197  [CHAV0II6-VKT  0  10000000000  OX  [HOLE  2  1239]]] 
aka  [PII  14  044]  aka  [B*  3  9441] 

((T  «<E1T  4  0390>)) 


[16:14:42]  Adding  obaarration  of  T  at 
[CHAiaiIO-W»T  0  1 0000000000  ILL  [BOLE  2  1239]]] 

a  a  a 

Thoro  aro  6  diagnoaoa  (ontropy  2.299)  accounting  for  .96: 
0.293  [[(912  Othor)]] 


Entropy  Signal;  Aliaaoa;  Value-Environment  Paira 

0.7197  [CHAI8ZI8-HBT  0  1 0000000000  [LL  [HOLS  2  1117]]] 
aka  [Pll  6  944]  aka  [BZ  4  9441] 

((T  #<EET  4  0390>)) 

[16:15:67]  Adding  obaarration  of  T  at 
[CHAI6ZI8-WET  0  10000000000  [LL  [HOLE  2  1117]]] 

a  a  a 

Thoro  aro  6  diagnoaoa  (ontropy  2.289)  accounting  for  .96: 
0.293  [[(912  Othor)]] 


Entropy  Signal;  Aliaaoa;  Taluo-Enrironnont  Paira 

0.7197  [CHAV8ZI8-V&T  0  10000000000  [LL  [HOLE  2  V208]]] 
aka  [PZI  24  943]  aka  [ZI  7  943A] 

((T  9<EIV  3  0320>)) 

[16:17:09]  Adding  obaarration  of  T  at 
[CHAE8ZI8-VBT  0  10000000000  [LL  [HOLE  2  1209]]] 
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Thar*  in  (  diantiH  (wtropjr  2.288)  ue«mtiB|  for  .96 : 
0.293  [[(012  Other)]] 


•  •  • 

Entropy  Signal;  Alias**;  Talue-Envixonnent  Pairs 

0.7167  [CHiXaXI6-HtT  0  10000000000  [LL  [HOLE  1  1289]]] 
aka  [PIE  16  021]  aka  [B I  6  021 A] 

(CT  #<EIT  3  0320>) ) 

[16:18:00]  Adding  observation  of  T  at 
[CHAIGZI6-VkT  0  10000000000  [LL  [BOLE  1  1289]]] 

s  •  * 

Thar*  ax*  6  diagnoses  (entropy  2.298)  accounting  for  .96: 
0.263  [[(012  Other)]] 


so* 

Entropy  Signal;  Aliases;  T alus- Environment  Pairs 

0.7167  [CHAiana-WET  0  10000000000  [LL  [BOLE  2  148]]] 
aka  [PII  28  043]  aka  [XI  11  043A] 

((T  #<EW  3  0320») 


[16:18:61]  Adding  observation  of  T  at 
[CBABOXM-HET  0  10000000000  [LL  [BOLE  2  148]]] 


•  •  # 

Thar*  are  6  diagnoses  (entropy  2.288)  accounting  for  .96: 
0.263  [[(012  Otbar)]] 


•  *  * 

Entropy  Signal;  Aliases;  Value- Environment  Pairs 

0.6619  [CEAE6IE6-WT  0  10000000000  [LL  [BOLE  3  1260]]] 
aka  [PXB  8  020]  aka  [XI  A  020C] 

((T  8<SIT  1  0200>)) 

[16:19:42]  Adding  observation  of  T  at 
[CBAIOIM-VET  0  10000000000  [LL  [BOLE  3  1260]]] 


There  are  6  diagnoses  (entropy  2.288)  accounting  for  .96: 
0.263  [[(012  Other)]] 


Entropy  Signal;  Aliases;  Talue-Environnent  Pairs 

0.6619  [CBAKXIO-WT  0  10000000000  [LL  [BOLE  2  1232]]] 
aka  [PXI  11  043]  aka  [XI  VK  043A] 

((T  #<BBV  3  01210>)) 

[16:20:34]  Adding  observation  of  T  at 
[CBAI6XK-VBT  0  10000000000  [LL  [BOLE  2  1232]]] 


There  are  6  diagnoses  (entropy  2.288)  accounting  for  .96: 
0.263  [[(012  Other)]] 
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Entropy  Signal;  Aliases;  Talue-Knrironaent  Pairs 

0.4434  [CHAIOXIO-VXT  0  10000000000  CLL  [BO LX  2  1213]]] 
aka  [PZI  3  020]  aka  [II  I  0201] 

((■XL  #<XST  1  04>)) 

[16:21:23]  Adding  observation  of  T  at 
[CHAICIIO-WT  0  10000000000  [LL  [BO LX  2  1213]]] 

Confliet!  Thors  aro  1  diagnoses  (entropy  0.000)  accounting  for  .96: 

1.000  [[(020  Other)]] 

Thors  aro  1  diagnoses  (entropy  0.000)  accounting  for  .96: 

1.000  [[(020  Other)]] 

Probes  Diagnoses 

(Ten)  afterwards 
■272  ttltmtl*  10 
■290  *****  6 
■230  #####  6 

■117  *****  6 
■208  *****  6 
■289  *****  6 
■48  *****  6 
■200  *****  6 
■232  *****  6 
■213  9  1 

T 
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A. 5  Audio  Decoder  Example  II  with  Syn¬ 
dromes 


Th«r«  are  1  diagnoaaa  (ant ropy  0.000)  accounting  for  .OS: 

1.000  CD] 

[14:48:46]  Adding  obeerration  of  0  a* 

[MAX-MZI-W  100000000  [T0LTACE  [B0 LX  1  I2T2]]J 

Thar#  ara  10  diagnoaaa  (ant ropy  3.260)  accounting  for  .OS: 

0.163  [[(043  Othar)]] 

Xa fining  012  with  ZIACTZYI 

Conflict)  Tkan  ara  10  diagnoaaa  (entropy  3.203)  accounting  for  .06: 
0.163  [[(043  Other)]] 

Thera  are  10  diagnoaaa  (entropy  3.203)  accounting  for  .06: 

0.163  [[(043  Other)]] 

Ka fining  Oil  with  CSB-ZXACTZTI 

Conflict!  There  are  11  diagnoaaa  (entropy  3.320)  accounting  for  .06: 
0.163  [[(043  Other)]] 

There  are  11  diagnoaaa  (entropy  3.320)  accounting  for  .06: 

0.163  [[(043  Other)]] 

Eefining  010  with  CSB-ZIACTZTI 

Conflict)  There  are  11  diagnoaaa  (entropy  3.282)  accounting  for  .06: 
0.163  [[(043  Other)]] 

a  a  a 

There  are  11  diagnoaaa  (entropy  3.282)  accounting  for  .06: 

0.183  [[(043  Other)]] 

Daconpoaing  (#< ASSUMPTION  +ZJT  [STATUS-OF  043  H03XZI6]>) 

There  are  11  diagnoaaa  (entropy  3.282)  accounting  for  .06: 

0.163  [[(043  Other)]] 

Daconpoaing  (#<ASSUMPTIO*  +IIF  [STATUS-OF  021  V0KXZX6]>) 

There  are  11  diagnoaaa  (entropy  3.282)  accounting  for  .06: 

0.163  [[(043  Other)]] 

Daconpoaing  (*< ASSUMPTION  +IIF  [STATUS-OF  U21  V0UCZ16]>) 

There  are  11  diagnoaaa  (entropy  3.282)  accounting  for  .86: 

0.163  [[(U43  Other)]] 

Daconpoaing  (JKASSUHPTIOI  +IIF  [STATUS-OF  U21  V0UCII6]>) 
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There  u»  11  diagnoses  (entropy  3.282)  woounting  for  .AS: 
0.163  [[(04*  Other)]] 

•  •  • 

Decomposing  ( #< ASSUXPTIOI  +117  [STATUS-07  044  VOMCXIQ]>) 

Thor*  are  11  diagnoses  (ant ropy  3.282)  accounting  for  .96: 
0.183  [[(043  Other)]] 


•  •  • 

Entropy  Signal;  Aliasoa;  Value- Environment  Pairs 

0.8293  [CHAIGXIG-HET  0  10000000000  [LL  [HOLE  2  1290]]] 
aka  [PXV  10  043]  aka  [XI  CS  0431] 

((T  Kin  6  01307>)) 

[14:63:23]  Adding  observation  of  1  a* 

[LL  [BOLE  2  1290]] 

Conflict!  Tbara  ara  8  diagnoaaa  (ant ropy  2.864)  accounting  for  .96: 
0.166  [[(012  Othar)]] 

•  •  • 

Conflict!  Tbara  ara  8  diagnoaaa  (entropy  2.634)  accounting  for  .96: 
0.203  [[(012  Othar)]] 

•  •  a 

Tbara  ara  8  diagnoses  (antropr  2.634)  accounting  for  .96: 

0.203  [[(012  Other)]] 

Bafiniag  020  with  CSB-XIACTXTE 

Conflict!  Tbara  ara  6  diagnoaaa  (antropr  2.632)  accounting  for  .96: 
0.218  [[(012  Other)]] 

There  ara  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Deeouposing  («<ASS0HPTXOI  +117  [STATUS-07  012  W0RKXIG]>) 

There  ara  8  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Deeouposing  (8<ASS0MPTXOI  +117  [STATUS-07  010  V0HXXIG]>) 

Thera  ara  8  diagnoaaa  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Decomposing  (*<ASSUM?TIOV  +117  [STATUS-07  011  V0HKXIG]>) 

Thera  ara  6  diagnoaaa  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 

Deeouposing  («<ASSUNPTX0V  +XI7  [STATUS-07  020  W0EKXI6]>) 

Thera  ara  6  diagnoses  (entropy  2.632)  accounting  for  .96: 

0.218  [[(012  Other)]] 
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Entropy  Signal;  Aliases;  Talue- Environment  Pairs 

1.0000  [LL  [HO LI  1  1213]] 

aka  [PII  16  010]  aka  [007  TC  0101] 

((o  »<urr  i  04>  «<urr  s  04ioa>  «<m  s  020i02>)) 

[14:66:47]  Adding  observation  of  T  at 
[CHAIOIie-VlT  0  10000000000  [LL  [BOLE  1  1213]]] 

Conflict!  Thar#  aro  1  diagnoses  (entropy  0.010)  accounting  for  .06: 
0.003  [[(020  Other)]] 

There  are  1  diagnoses  (entropy  0.010)  accounting  for  .06: 

0.003  [[(020  Other)]] 

•  •  • 

Probes  Diagnoses 
(Three)  aftemrds 
1272  11 

1200  ******  6 
■213  *  1 


T 
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A.6  Audio  Decoder  Example  III 


Thara  are  1  diagnoaaa  (ant ropy  0.000)  accounting  for  .OB: 

1.000  [[]] 

[11:00:23]  Adding  obaaryation  of  30  at 
[KAX-KII-W  100000000  [TOLTA0*  [HO LX  1  1272]]] 

There  ara  1  diagnoaaa  (ant ropy  0.000)  accounting  for  .06: 

1.000  [□] 

[11:00:32]  Adding  obaarration  of  2000.0  at 
[FW  60000000  '(IXL  T)  [CUSS  (UP?  2  11)  [T0LTAQI  [H0L1  1  1272]]]] 

a  a  a 

Ihara  ara  1  diagnoaaa  (ant ropy  0.000)  aeeonnting  for  .06: 

1.000  [[]] 

[11:00:37]  Adding  obaarration  of  20000.0  at 
[FW  60000000  '(IXL  T)  [CROSS  0  [DT  [70LTACI  [HO LX  1  1272]]]]] 

a  a  a 

Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  (*<ASSOKPTI0I  +117  [STAT0S-0F  043  WRKXIG]>) 

a  a  a 

Thara  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  («<ASSOKPT10I  +117  [STATOS-OF  012  V0RXXI6]>) 

Thara  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  («<ASSOKPTI0I  +117  [STATOS-OF  044  V0RXXVQ]>) 

a  a  a 

There  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  (*<ASSOMPTX0l  +IIF  [STATOS-OF  044  V0RKXI6]>) 

Thara  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  (*<ASSOMPTXOI  +I*F  [STATOS-OF  044  V0RXXIS]>) 

Thara  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 

Daconpoaing  (*<ASSOMPTXOI  +IIF  [STATOS-OF  021  V0RXXI0]>) 

Thara  ara  10  diagnoaaa  (entropy  3.200)  accounting  for  .06: 

0.103  [[(043  Other)]] 


« 


i 


% 
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Kntropy  Signal;  iliuu;  TtlM-bTirount  Sain 

0.8301  [CHAIOXIO-VST  0  10000000000  [LL  [BO LX  3  1390]]] 
aka  CPXV  10  043]  aka  CXI  CS  8431] 

|  (CT  #<UT  6  01307>)) 

[11:04:43]  Adding  obaamtion  of  T  at 
[CH1ICII0-VXT  0  10000000000  [LL  [BO LI  3  1300]]] 

•  •  • 

Than  aza  10  diagnoaae  (out ropy  3.369)  accounting  for  .95: 
0.163  [[(U43  Othar)]] 

I 

o  o  « 

Intropy  Signal;  Aliasaa;  Talna-Environaont  Pain 

0.7617  [CKiiaXIO-VK  0  10000000000  [LL  [HO LX  3  1380]]] 
aka  CPU  10  010]  aka  [XI  KIBT  0101] 

((T  #<XIT  5  0307>) ) 

[11:06:43]  Adding  obaamtion  of  T  at 
1  [CHAICIIC-WT  0  10000000000  [LL  [HOLS  3  1380]]] 

•  a  a 

Than  ara  10  diagnaaaa  (antropy  3.360)  accounting  for  .96: 
0.163  [[(043  Othar)]] 


Entropy  Signal;  Aliaaaa;  Talno-Xnrironaoat  Pair a 

0.7617  [CHANIM-VET  0  10000000000  [LL  [HOLS  1  1139]]] 
aka  [PII  11  010]  aka  [00T  3  0101] 

((T  #<SIT  6  0307>)) 

[11:06:43]  Adding  obaamtion  of  T  at 
[CHlieiM-VST  0  10000000000  [LL  [HOLS  1  1139]]] 

Than  ara  10  diagnaaaa  (antropy  3.369)  accounting  for  .95: 
0.163  [[(043  Othar)]] 


Entropy  Signal;  Aliaaaa;  Taloa-Snrironaont  Pain 

0.7388  [CHAI6XI6-VST  0  10000000000  [LL  [HOLS  3  1113]]] 
aka  [PXI  31  043]  aka  [XI  4  0431] 

((T  #<KIf  4  0S60>) ) 

[11:07:43]  Adding  obaamtion  of  T  at 
[CHAI6XK-VST  0  10000000000  [LL  [HOLS  3  1113]]] 

Than  an  10  diagnaaaa  (antropy  3.369)  accounting  for  .96: 
0.163  [[(043  Othar)]] 


Sntropy 


Signal;  Aliaaaa;  Talua- Environment  Pain 
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0.6980  [CHAIGXI6-WBT  0  10000000000  [LL  [HOLX  1  188]]] 
aka  [HI  14  021]  aka  CBZ  3  0211] 

((T  #<1VT  3  0320>)) 

[11:08:43]  Adding  obaarration  of  T  at 
[chaioug-vat  o  10000000000  [ll  [mu  1  iss]]] 

•  •  • 

Thar*  ara  10  diagnoaaa  (ant ropy  3.289)  aceounting  for  .96: 
0.183  [[(043  Othar)]] 


Intropy  Signal;  Aliaaaa;  Valna-Bnrironnant  Paira 

0.6980  [CKAI0XI8-VBT  0  10000000000  [LL  [HOU  2  148]]] 
aka  [PXI  28  043]  aka  [XI  11  0431] 

((T  KUT  3  0320») 

[11:09:43]  Adding  obaarration  of  T  at 
[CHAIGXIG-VHT  0  10000000000  [LL  [DU  2  148]]] 

Thorn  ara  10  diagnaaaa  (antropy  3.269)  aeaounting  for  .96: 
0.163  [[(043  Othar)]] 


•  •  a 

Entropy  Signal;  Aliaaaa;  Valua-BnrironBont  Paira 

0.6830  [CHAI0II0-HRT  0  10000000000  [LL  [HOU  1  1232]]] 
aka  [PX1  8  022]  aka  [OUT  T  022C] 

((T  KUT  3  01210>)) 

[11:10:43]  Adding  obaarration  of  T  at 
[CHAIGXIG-VHT  0  10000000000  [LL  [HOU  1  1232]]] 

•  •  • 

Thorn  ara  10  diagnoaaa  (antrapy  3.269)  accounting  for  .96: 
0.163  [[(043  Othar)]] 


Bntropy  Signal;  Aliaaaa;  Valua-Bnrironnont  Paira 

0.4270  [CHAI0II0-VHT  0  10000000000  [LL  [HOLB  3  1169]]] 
aka  [PII  19  043]  aka  [XI  2  043A] 

((T  #<BIT  2  0140>)) 

[11:11:46]  Adding  obaarration  of  T  at 
[CHAIGXI6-VBT  0  10000000000  [LL  [HOU  3  1169]]] 

Thorn  ara  10  diagnoaaa  (antropy  3.269)  accounting  for  .96: 
0.163  [[(043  Othar)]] 


Bntropy  Signal;  Aliaaaa;  Valua-Bnrironaant  Paira 

0.4270  [CHAIGXIG-WRT  0  10000000000  [LL  [HOLB  1  1208]]] 
aka  [PXI  6  021]  aka  [BX  2  021A] 
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I 


((t  *<xrr  2  oi20>)) 

[11:12:46]  Adding  ob(«rnti«n  of  T  at 
[CHAIOZM-VKT  0  10000000000  [LL  [HO LI  1  1208]]] 

There  are  10  diagnoses  (entropy  S.260)  accounting  for  .06: 
0.163  [[(043  Other)]] 


Kntropy  Signal;  Aliases;  Talua-Eaviroament  Pairs 

0.4270  [CHAI6XI6-HHT  0  10000000000  [LL  [HOLE  3  1194]]] 
aka  [PII  20  043]  aka  [XI  3  0431] 

((T  #<1IT  2  0140>)) 

[11:13:47]  Adding  observation  of  T  at 
[CHAiaXia-VIT  0  lOOOOOOOOOO  [LL  [HO LI  3  1104]]] 

there  are  10  diagnoses  (entropy  3.269)  ao counting  for  .06: 
0.163  [[(043  Other)]] 


ooo 

Entropy  Signal;  Aliases;  Tain*- Environment  Pairs 

0.4270  [CHAI6IM-UET  0  10000000000  DLL  [BOLE  3  1131]]] 
aka  [PII  22  043]  aka  [XI  6  0431] 

((T  0<IIT  2  0120>)) 

[11:14:40]  Adding  observation  of  T  at 
[CHAieiM-VET  0  10000000000  [LL  [HOLE  3  1131]}] 


Thera  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 
0.163  [[(043  Other)]] 


*  •  • 

Kntropy  Signal;  Aliases;  Talus- Environment  Pairs 

0.4270  [auiexie-VET  o  ioooooooooo  Dll  [hole  2  uso]]] 
aka  [PII  23  043]  aka  [XI  6  043A] 

((T  #<HIT  2  0120>)) 

[11:16:62]  Adding  observation  of  T  at 
[CHAI6XI6-VKT  0  10000000000  [LL  [HOLE  2  1139]]] 


There  aro  10  diagnoses  (entropy  3.269)  ac coanting  for  .96: 
0.163  [[(043  Other)]] 


Kntropy  Signal;  Aliasos;  Talue- Environment  Pairs 

0.4270  [CHAI6XI6-WKT  0  lOOOOOOOOOO  [LL  [BOLE  1  1246]]] 
aka  [PXI  6  021]  aka  [BX  4  021A] 

((T  9<BIT  2  0120>)) 
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[11: 10: S3]  Adding  observation  o f  1  at 
[LL  [mU  1  1240]] 

Conflict!  Thor#  aro  2  diagnoses  (entropy  0.018)  accounting  for  .06: 
0.067  [[(021  Other)]] 

Thoro  aro  2  diagnosos  (ontropy  0.018)  accounting  for  .06: 

0.007  [[(021  Othor)]] 


Entropy  Signal;  Aliasos;  Talus- Environment  Pairs 

0.0408  [CHAE0XR0-HBT  0  10000000000  [LL  [HOLE  2  1236]]] 
aka  [PIE  14  044]  aka  [BZ  3  044A] 

((T  «<SVT  2  0140>)) 

[11:17:44]  Adding  observation  of  T  at 
[CHAIOIIO-WET  0  10000000000  [LL  [HOLE  2  1236]]] 

There  aro  2  diagnosos  (ontropy  0.018)  accounting  for  .06: 
0.007  [[(021  Other)]] 


a  •  a 

Entropy  Signal;  Aliases;  Value- Envir oiment  Pairs 

0.6408  [CBAIQXRQ-VBT  0  10000000000  [LL  [BOLE  2  1117]]] 
aka  [PIE  6  044]  aka  [BI  4  044A] 

((T  8<EBT  2  0140>)) 

[11:18:48]  Adding  observation  of  T  at 
[CHAEQXEG-VBX  0  10000000000  [LL  [DU  2  E117]]] 

see 

Thoro  aro  2  diagnosos  (ontropy  0.018)  accounting  for  .06: 
0.007  [[(021  Other)]] 


Entropy  Signal;  Aliases;  Talus- Environment  Pairs 

0.0408  [CHAEGIE6-WRT  0  10000000000  [LL  [B0U  6  E140]]] 
aka  [PIE  10  Oil]  aka  [XE  EEBT  011A] 

((EIL  #<EET  1  01 00>)) 

[11:10:30]  Adding  observation  of  1  at 
[LL  [HOLE  6  El 40]] 


Thoro  are  2  diagnoses  (entropy  0.018)  accounting  for  .06: 
0.067  [[(021  Other)]] 


Entropy  Signal;  Aliases;  Value-Environment  Pairs 

0.6408  [CHAEGXRG-VBT  0  10000000000  [LL  [BOLE  3  E264]]] 
aka  [PZE  0  021]  aka  [XE  CUAE  021A] 

((EIL  *<EET  1  0100>)) 
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[11:20:14]  IddlM  obttmtioa  of  1  at 
(XL  CHOU  3  1264]] 

Conflict!  Hurt  oro  1  diagnoses  (entropy  0.000)  accounting  for  .06: 
1.000  [[(021  Other)]] 

Thor*  oro  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  CC0J21  Other)]] 


Probes 

(Twenty) 

1272 

*272 

Diagnoses 
afterwards 
#  1 

10 
4  A 

liVO 

1280 

NtMttttt 

10 

10 

*120 
W4  4  4 

9999999*99 

10 
4  A 

188 

9999999999 

ID 

10 

*48 

9999999999 

10 

*232 

10 

*160 

********** 

10 

*208 

********** 

10 

*104 

********** 

10 

*131 

********** 

10 

■130 

9999999993* 

10 

*248 

**  2 

1238 

*#  2 

*117 

M  2 

*140 

##  2 

■264 

«  1 

T 
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A. 7  Audio  Decoder  Example  III  with  Syn¬ 
dromes 


Then  ara  1  ditnoiM  (rataopy  0.000)  accounting  for  .06: 

i.ooo  cm 

[10:30:04]  Adding  obaorration  of  30  at 
[MAX- Nil- W  100000000  [T0LTA8I  [NOLI  1  1272]]] 

Thara  ara  1  diagnoaaa  (aatropy  0.000)  accounting  for  .06: 

1.000  [□] 

[10:30:12]  Adding  obaervation  of  2000.0  at 
[FVV  60000000  '(.IIL  T)  [CUSS  (HPT  2  11)  [Y0LTA8E  [HOLE  1  1272]]]] 

Tbara  ara  1  diagnoaaa  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 

[10:30:17]  Adding  obaervation  of  20000.0  at 
[FVW  60000000  *(«IL  T)  [CUSS  0  [DT  [VOLTAGE  [HOLE  1  K272]]]]] 

a  a  a 

Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  for  .06: 

0.163  [[(U43  Othar)]] 

Nafining  012  «ith  IIACTIT1 

a  a  a 

Conflict!  Thara  ara  10  diagnoaaa  (aatropy  3.203)  accounting  for  .06: 
0.103  [[(043  Othar)]] 

a  a  a 

Thara  ara  10  diagnoaaa  (antropy  3.203)  aeeouating  for  .06: 

0.103  [[(043  Othar)]] 

Nafining  Oil  with  CSB-IEACTIVE 

Conflict!  Thara  ara  11  diagnoaaa  (antropy  3.320)  accounting  for  .06: 
0.103  [[(043  Other)]] 

Conflict!  Thara  ara  10  diagnoaaa  (aatropy  3.200)  accounting  for  .06: 
0.100  [[(043  Othar)]] 

Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  for  .06: 

0.160  [[(043  Othar)]] 

Nafining  O10  with  CSB-XVACTXVE 

Conflict!  Thara  arc  10  diagnoaaa  (aatropy  3.172)  accounting  for  .06: 
0.100  [[(043  Othar)]] 

There  ara  10  diagnoaaa  (antropy  3.172)  accounting  for  .06: 

0.160  [[(043  Othar)]] 

Oeconpoaing  (*<ASSOMPTX0V  +11?  [STATOS-OP  043  V0NKXI6]>) 


i 
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Thtn  ut  10  diunoui  (entropy  3.172)  accounting  for  .06: 
0.160  [[(043  Other)]] 

Do  composing  OkASSOHPTXOI  +1*7  [STATUS-07  021  U0UU6]>) 

Hurt  aro  10  diagnoses  (entropy  3.172)  aeconating  for  .06: 
0.168  [[(043  Other)]] 

Decomposing  (XASSUMPTIOI  +117  [STATUS-07  021  V0tKU0]>) 

Thor*  aro  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other))] 

Decomposing  (XASSUK7TI0*  +137  [STATUS-OT  021  U0UCU6]>) 

There  are  10  diagnoses  (entropy  3.172)  aeootmting  for  .06: 
0.168  [[(043  Other)]] 

Decomposing  (XASS0MPTZ0I  +117  [STATUS-07  044  H0UX16]>) 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.166  [[(043  Other)]] 


■ntropy  Signal;  Aliases;  Value- Inrironment  Pairs 

0.8173  [CHAIOXie-WT  0  10000000000  [U  [H0L1  2  1200]]] 
aka  [PU  10  043]  aka  CXI  CS  043 A] 

((T  KBT  6  01307>}) 

[10:36:00]  Adding  observation  of  T  at 
[ClAieZK-VkT  0  10000000000  CIX  [HOLK  2  1200]]] 

There  are  10  diagnoses  (entropy  3.172)  aocountiiw  for  .06: 
0.168  [[(043  Other)]] 


e  e  e 

■ntropy  Signal;  Aliases;  Talue-tnvironaent  Pairs 

0.7460  [CHA*8II0-fikT  0  10000000000  Ctt  [SOU  1  1120]]] 
aka  [PU  11  010]  aka  [OUT  3  010  A] 

((T  KBT  6  0307>)) 

[10:37:13]  Adding  observation  of  T  at 
[CHAMXM-WT  0  10000000000  [LL  [90 LZ  1  1128]]] 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .86: 
0.168  [[(043  Other)]] 


e  e  e 

■ntropy  Signal;  Aliases;  Talus- Invironnent  Pairs 

0.7460  [CBAI8U6-V1T  0  10000000000  [U  [SOLI  2  1280]]] 
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Hu  CPU  10  010]  aka  CXV  KJTBT  0101] 

((T  #<UT  5  0S07>)) 

[10:30:19]  Adding  obaerration  of  T  at 
[CHAIOIIS-VkT  0  10000000000  [LL  [BO LI  2  1280]]] 

Tbara  aza  10  diagnaaaa  (entropy  3.172)  accounting  fox  .00: 
0.108  [[(043  Other)]] 


Satrapy  Signal;  lliaaaa;  Talue- Environment  Paira 

0.7093  CCBIXOXIO-WKT  0  10000000000  [LL  [HOLS  1  1230]]] 
aka  [PXI  10  880]  aka  [BX  10  3801] 

((T  #<MT  4  0300>» 

[10:39:19]  Adding  obaarration  of  T  at 
[CHABQXB6-VBT  0  10000000000  [LL  [HOLE  1  8230]]] 

a  a  a 

Thara  ara  10  diagnaaaa  (antrepy  3.172)  aeeotmting  far  .98: 
0.108  [[(043  Othar)]] 


a  a  a 

Entropy  Signal;  lliaaaa;  Value- Environment  Paira 

0.7093  [CH1B6XBQ-H8T  0  10000000000  [LL  [HOLE  3  8117]]] 
aka  [PI8  18  043]  aka  [XB  1  0431] 

((T  #<EBV  4  0380>) ) 

[10:40:22]  Adding  obaarration  of  T  at 
[CBA80X80-VET  0  10000000000  [LL  [BOLE  3  8117]]] 

Thara  ara  10  diagnaaaa  (aatropy  3.172)  accounting  far  .96: 
0.188  [[(043  Other)]] 


Entropy  Signal;  lliaaaa;  Yalue- Enriroxuoant  Paira 

0.7093  [CHAB6I86-URT  0  10000000000  [LL  [HOLE  2  8194]]] 
aka  [PZ8  4  044]  aka  [BX  6  0441] 

((T  *<KBT  4  0360>)) 

[10:41:27]  Adding  obaarration  of  T  at 
[CHABCXBG-VBT  0  10000000000  [LL  [HOLE  2  8194]]] 

Thara  ara  10  diagnoaaa  (antropy  3.172)  accounting  for  .96: 
0.168  [[(043  Other)]] 


Entropy  Signal;  lliaaaa;  Value- Environment  Paira 

0.7093  [CHAEG X86- WET  0  10000000000  [LL  [HOLE  2  8112]]] 
aka  [PX8  21  043]  aka  [XE  4  0431] 

((T  #<BBY  4  0360>)) 


I 


» 
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[10:42:33]  Adding  observation  of  T  at 
[CHAICI1C-V1T  0  10000000000  [LL  [BO LX  2  1112]]] 

Thors  aro  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other)]] 


Entropy  Signal;  Aliases;  Value- Environment  Pairs 

0.6676  [CBAI6XI0-VET  0  10000000000  [LL  [BOLX  2  1208]]] 
aka  [PII  24  043]  aka  CXI  7  043A] 

((T  #<117  3  0320>)) 

[10:43:38]  Adding  observation  of  T  at 
[CBAI6XI6-VkT  0  10000000000  [LL  [BOLX  2  1208]]] 

•  •  • 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other)]] 


•  •  * 

Entropy  Signal;  Aliases;  Talue-Xnviroaaant  Pairs 

0.6676  [CHAI6Xia-VkT  0  10000000000  [LL  [BOLX  1  1280]]] 
aka  [PII  16  021]  aka  [XX  6  021 A] 

((T  «<XXT  3  0320>)) 

[10:44:41]  Adding  observation  of  T  at 
[CBAXQIia-VkT  0  10000000000  (XL  [BOLX  1  1280]]] 

•  •  • 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other)]] 


•  •  « 

Entropy  Signal;  Aliases;  Yalue-Baviroanent  Pairs 

0.6676  [CBAiaiXO-UXT  0  10000000000  [LL  [BOLX  2  148]]] 
aka  [PII  28  043]  aka  [XI  11  043A] 

((T  #<117  3  0320>)) 

[10:46:47]  Adding  observation  of  T  at 
[CBAIGIie-VKT  0  10000000000  [LL  [BOLX  2  148]]] 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .96: 
0.168  [[(043  Other)]] 


Xntropy  Signal;  Aliases;  Talue-Xnvironnent  Pairs 

0.6612  [CBAI6XI6-VXT  0  10000000000  [LL  [BOLX  2  1232]]] 
aka  [PXI  11  043]  aka  [XI  VX  043 A] 

((T  #<XIY  3  01210>)) 

[10:46:60]  Adding  observation  of  T  at 
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[CHAiaXK-WET  0  lOOOOOOOOOO  [LL  [HO LI  2  1232]]] 

•  •  • 

Th*r*  ax*  10  diagnose*  (entropy  3.172)  accounting  for  .06: 
0.168  [t(1>43  Other)]] 


Entropy  Signal;  Aliases ;  Value- Environment  Pairs 

0.4377  [CHAI6XI8-VET  0  10000000000  [LL  [HOLE  1  1130]]] 
aka  [PZI  13  U21]  aka  [BZ  1  021  A] 

(CT  #<**?  2  0120>)) 

[10:47:68]  Adding  observation  of  T  a* 

[CHAI8XI8-VET  0  10000000000  [LL  [HO LI  1  1130]]] 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other)]] 


Entropy  Signal;  Aliases;  Value-Environment  Pairs 

0.4377  [CHAI6XI6-VRT  0  10000000000  [LL  [HOLS  1  1131]]] 
aka  [PZI  7  021]  aka  [BZ  0  021A] 

((T  #<EIV  2  0120>)) 

[10:40:08]  Adding  observation  of  T  at 
[CHAieZIO-HBT  0  10000000000  [LL  [HOLE  1  1131]]] 

•  ♦  • 

There  are  10  diagnoses  (entropy  3.172)  accounting  for  .06: 
0.168  [[(043  Other)]] 


•  •  • 

Entropy  Signal;  Aliases;  Value-Environment  Pairs 

0.4377  [CHAieZie-UET  0  10000000000  [LL  [BOLE  2  1246]]] 
aka  [PZI  26  043]  aka  [XI  0  043A] 

((T  #<EIV  2  0120>)) 

[10:60:16]  Adding  observation  of  1  at 
[LL  [HOLE  2  1246]] 

Conflict!  There  are  2  diagnoses  (entropy  0.023)  accounting  for  .06: 
0.660  [[(021  Other)]] 

There  are  2  diagnoses  (entropy  0.023)  accounting  for  .06: 

0.660  [[(021  Other)]] 


Entropy  Signal;  Aliases;  Value-Environnent  Pairs 

0.6408  [CHAIQXIQ-VKT  0  10000000000  [LL  [HOLE  3  1160]]] 
aka  [PZI  10  043]  aka  [XI  2  043A] 

((T  #<SIV  2  0140>)) 
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[10:61:13]  Adding  observation  of  T  at 
[CBAiaXK-VtT  0  10000000000  [IX  [BOU  3  1160]]} 

Thoro  aro  2  diagnoses  (entropy  0.023)  accounting  for  .06: 
0.M0  [[(021  Other)]] 


Entropy  Signal;  Aliases;  Talue-tnvironaent  Pairs 

0.640S  [CHAI6 XI6- VET  0  10000000000  [LL  [10U  4  1264]]] 
aka  [PXI  10  044]  aka  [XI  SI  044A] 

((IXL  #<m  1  0100>)) 

[10:62:02]  Adding  observation  of  1  at 
[LL  [mu  4  1264]] 

•  •  • 

Conflict!  Thoro  aro  1  diagnoses  (entropy  0.016)  accounting  for  .06: 
0.000  [[(021  Other)]] 

Thoro  aro  1  diagnoses  (entropy  0.016)  accounting  for  .06: 

0.000  [[(021  Other)]] 


Probes 

(lineteen) 


Diagnoses 

afterwards 


A.8.  AUDIO  DECODER  EXAMPLE  IV 


A.8  Audio  Decoder  Example  IV 


There  >r«  1  diagnoses  (entropy  0.000)  see ousting  for  .96: 

1.000  [[]] 

[20:16:12]  Adding  observation  o f  30  at 
[MAX-MXI-W  100000000  [VOLTAGE  [BOLE  1  1272]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .96: 

1.000  [[}] 

[20:16:19]  Adding  observation  of  20000.0  at 
[FW  60000000  ’  (IIL  T)  [CE0SS  (EXPT  2  11)  [VOLTAGE  [BOLE  1  1272]]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .96: 

1.000  [[]] 

[20:16:24]  Adding  observation  of  20000.0  at 
[FW  60000000  ’(MIL  T)  [CROSS  0  [DT  [VOLTAGE  [BOLE  1  1272]]]]] 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  («<ASSUMFTI0I  +IIF  [STATUS-0F  U43  V0&KXIC]>) 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  (»<ASS0NPTIOB  +IIF  [STAT0S-OF  012  V0MCXIC]>) 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  (*<ASS0MPTXOI  +IIF  [STAT0S-OF  044  U0RXXIG]>) 

•  e  e 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  (»<ASS0MPTXOV  +IIF  [STAT0S-OF  044  H0EKXVG]>) 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  (*<ASS0HPTXOV  +IIF  [STATUS -OF  044  V0RXXVG]>) 

There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 

Decomposing  («<ASS0MPTXOI  +IIF  [STATUS-0F  021  V0RKXIG]>) 

I  There  are  10  diagnoses  (entropy  3.269)  accounting  for  .96: 

0.163  [[(043  Other)]] 
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Entropy  Signal;  Aliaaaa;  Talue-tarironnent  Paira 

0.8291  [CHAI0XI6-V1T  0  10000000000  [LL  [HOLE  2  1290]]] 
aka  [PZI  10  043]  aka  [II  CS  0431] 

C(T  #<S1T  «  01307>)) 

[20:19:11]  Adding  obaorration  o f  T  at 
[CH1I61E0-VET  0  10000000000  [LL  [HOLE  2  E290]]] 

Thar#  ara  10  diagnoaaa  (antropy  3.269)  accounting  for  .96: 
0.163  [[(043  Othar)]] 


Signal;  Aliaaaa;  Value-lnrironnent  Paira 
[CHA10I16-U1T  0  10000000000  [LL  [HOLS  1  1280]]] 
aka  [PZI  16  011]  aka  [O0T  TC  0111] 

<(T  8<EHT  6  0307>)) 


[20:20:14]  Adding  obaorration  of  1  at 
[LL  [HOLE  1  1280]] 


Conflict!  Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Otbar)]] 


Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Othar)]] 

Dacoapoaing  (*<ASSUNPTZOS  +11?  [ST1T0S-OP  010  VDBXXIC]>)  . 

Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Other)]] 

Dacoapoaing  («<1SS0NPTXOH  +11?  [STAT0S-OP  011  V01KXIG]>) 

Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Othar)]] 

Dacoapoaing  (8<ASS0NPTXO1  +XI?  [ST1T0S-O?  020  W0RKIIC]>) 

Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Othar)]] 


Entropy  Signal;  lliaaaa;  Talua-Bnvironaant  Paira 

0.7167  [CHA16X1G-V1T  0  10000000000  [LL  [HOLE  1  1169]]] 
aka  [PX1  8  116]  aka  [BX  8  1161] 

((T  8<HT  4  0360>)) 

[20:22:36]  ldding  obaorration  of  T  at 
[CHAieil«-VlT  0  10000000000  [LL  [HOLE  1  1169]]] 

Thara  ara  6  diagnoaaa  (antropy  2.288)  accounting  for  .96: 
0.263  [[(012  Other)]] 
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Entropy  Signal;  Aliaaas;  Talus- Enrironnent  Pairs 

0.7107  [CBilGXIG-WtT  0  10000000000  [LX.  EBOLX  1  1104]]] 
aka  [PII  7  tie]  aka  [BX  7  U6A] 

((T  KBIT  4  O30O>)) 

[20:23:31]  Adding  obssrration  of  T  at 
[CBAIGIIG-VKT  0  10000000000  ELL  EHOLI  1  1104]]] 

Thar*  ara  6  diagnosas  (ant ropy  2.200)  account i&g  for  .OS: 
0.203  [[(012  Other)]] 


Kntropy  Signal;  Aliasas;  Talue-Enrironnent  Pairs 

0.7107  [CBAIGIIQ-VtT  0  10000000000  ELL  EHOLI  2  1130]]] 
aka  EPXI  23  043]  aka  EXI  0  043A] 

((I  KBIT  3  0320>)) 

[20:24:37]  Adding  obssrration  of  T  at 
[CBAI6XI6-VBT  0  10000000000  ELL  EHOLB  2  1130]]] 

Thors  ara  6  diagnosas  (antropy  2.280)  accounting  for  .06: 
0.203  EE(U12  Other)]] 


Signal;  Aliasas;  Talue-Enrironnent  Pairs 
ECIIAIQXIG-VKT  0  10000000000  ELL  EHOLB  2  188]]] 
aka  EPXI  26  043]  aka  EXI  8  043 A) 

((T  KBIT  3  0320>) ) 


[20:20:23]  Adding  obssrration  of  T  at 
[CBAIGXIG-VBT  0  10000000000  [LL  [BO LB  2  188]]] 

Thara  ara  6  diagnosas  (antropy  2.288)  accounting  for  .95: 
0.203  EE(D12  Other)]] 


Entropy  Signal;  Aliasas;  Talua-Bnvironnsnt  Pairs 

0.7107  [CBAIGIIG-WRT  0  10000000000  ELL  [BOLB  1  1240]]] 
aka  [PXI  5  021]  aka  [BX  4  021A] 

( (T  KBIT  3  0320>)) 

[20:27:21]  Adding  obssrration  of  T  at 
[CBAIGXIG-VRT  0  10000000000  ELL  [BOLB  1  1240]]] 

Thara  ara  6  diagnosas  (antropy  2.288)  accounting  for  .95: 
0.203  [[(012  Other)]] 


Entropy  Signal;  Aliasas;  Talue-Enrironaent  Pairs 


I 
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0.7107  [CHAICXIG-UET  0  10000000000  [LL  [HOLS  1  1388]]] 
aka  [MI  16  031]  aka  CBI  6  U21A] 

((T  #<EIT  3  0320») 

[20:28:24]  Adding  observation  of  T  at 
[CHAIGIIG-WET  0  10000000000  [LL  [HOLE  1  1288]]] 

There  are  6  diagnoses  (entropy  2.288)  accounting  lor  .86: 
0.363  [[(012  Other}]] 


e  e  a 

Entropy  Signal;  Aliaeea;  Talue-Environnent  lairs 

0.7167  [CHAIGXIG-WET  0  10000000000  [XX  [HOLE  2  148]]] 
aka  [MI  28  043]  aka  [XI  11  043 A] 

((T  #<UT  3  0320>» 

[20:20:22]  Adding  observation  of  T  at 
[CHAIGXIG-VET  0  10000000000  [LL  [BOU  2  148]]] 

Thara  are  6  diagnoses  (entropy  2.288)  aceonntiag  lor  .86: 
0.263  [[(012  Other)]] 


Entropy  Signal;  Aliases;  Talue-Environnant  Pairs 

0.7166  [CHAIGIIG-WET  0  10000000000  CtX  [HOLE  2  1223]]] 
aka  [MI  11  020]  aka  [XI  A  0200] 

((ML  8<EIV  2  06>)) 

[20:30:30]  Adding  observation  o 1  T  at 
[CHAIGXIO-VET  0  10000000000  [LL  [HOLE  2  1223]]] 

e  e  e 

Conflict!  Thara  are  2  diagnoses  (entropy  0.887)  accounting  1 or  .86: 
0.633  [[(Oil  Other)]] 

There  are  2  diagnoses  (entropy  0.887)  accounting  lor  .86: 

0.633  [[(Oil  Other)]] 


Entropy  Signal;  Aliases;  Talus- Environment  Pairs 

0.8367  [CHAIGXIG-WET  0  10000000000  [LL  [HOLE  3  1101]]] 
aka  [PXI  8  010]  aka  [XI  LOAD  010A] 

((ML  8<EIT  1  01>)) 

[20:31:12]  Adding  observation  o 1  T  at 
[CHAIGXIG-VET  0  10000000000  [LL  [HOLE  3  1101]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .86: 
1.000  [[(Oil  Other)]] 

Probes  Diagnoses 

(Fourteen)  el terser ds 
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A.9  Audio  Decoder  Example  IV  with  Syn¬ 
dromes 


There  are  l  diagnose*  (ant ropy  0.000)  accounting  lor  .OS: 

1.000  CO) 

t00 : ST: 40]  Adding  obiamtion  of  30  at 
[MAX-MXI-W  100000000  [V0LTA6E  [B0LS  1  1273])] 

There  ara  1  diagnoaaa  (ant ropy  0.000)  accounting  lor  .06: 

1.000  [[]] 

[00:37:67]  Adding  obaarration  ol  20000.0  at 
CFW  60000000  '(MIL  T)  [CUSS  (EXPT  2  11)  [T0LTACK  [HOLE  1  1272]]]] 

Thara  ara  1  ding  oaea  (antropy  0.000)  accounting  lor  .06: 

1-000  [[]] 

[00:38:02]  Adding  obaarration  ol  20000.0  at 
[KW  60000000  >[IXL  T)  [CUSS  0  [DT  [VOLTAGE  [HOLE  1  1272]]]]] 

Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  lor  .06: 

0.103  [[(043  Other)]] 

A*  fining  012  with  XIACTXTE 

Conflict I  Thara  ara  10  diagnoaaa  (antropy  3.203)  accounting  lor  .06: 
0.103  [[(043  Other)]] 

ara 

Thara  ara  10  diagnoaaa  (antropy  3.203)  accounting  lor  .06: 

0.103  [[(043  Other)]] 

Eaiining  Oil  uith  CSB-XIACTXY2 

Conflict!  Thara  ara  11  diagnoaaa  (antropy  3.320)  accounting  lor  .06: 
0.103  [[(043  Other)]] 

Conflict!  Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  lor  .06: 
0.108  [[(043  Other)]] 

Thara  ara  10  diagnoaaa  (antropy  3.200)  accounting  lor  .06: 

0.108  [[(043  Other)]] 

Eaiining  010  wi*h  CSB-XIACTXTE 

Conflict!  Thara  ara  10  diagnoaaa  (antropy  3.172)  accounting  lor  .06: 
0.108  [[(043  Other)]] 

Thara  ara  10  diagnoaaa  (antropy  3.172)  accounting  lor  .06: 

0.108  [[{043  Other)]] 


Decoupoaing  (#<ASSUHPTIO*  +II7  [STATUS-07  043  V0tXXIG]>) 
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•  •  • 

Hun  u«  10  diuuHi  (ant ropy  9.172)  accounting  for  .OS: 
0.100  [[(U43  Other)]] 

Dacoapoaing  (#<ASSUKPTI0I  +IIF  [STATUS-OP  021  V0UZIS]>) 

•  a  a 

Thara  ara  10  diagnoaaa  (ant ropy  9.172)  accounting  for  .06: 
0.108  [[(049  Other)]] 

Dacoapoaing  (#<ASSUMPTI0I  +10  [STATUS-07  U21  H0JUCZIC]>) 

Thara  ara  10  diagnoaaa  (entropy  9.172)  accounting  for  .06: 
0.108  [[(049  Other)]] 

Dacoapoaing  (#<ASSUXPTI0«  +117  [STATUS-0F  021  V0KII«]>) 

Thara  ara  10  diagnoaaa  (entropy  9.172)  accounting  for  .06: 
0.108  [[(049  Other)]] 

Dacoapoaing  (#<ASSUKPTI0I  +ZIF  [STATOS-OF  044  V0UZIQ]>) 

Thara  ara  10  diagnoaaa  (entropy  9.172)  accounting  for  .06: 
0.108  [[(049  Other)]] 


•  •  • 

Entropy  Signal;  Aliaaea;  Value-Environment  Faira 

0.8179  [CHAIGZie-VRT  0  10000000000  [IX  [HOLE  2  1200]]] 
aka  [PZH  10  049]  aka  [II  CS  U43A] 

((T  KBIT  0  01307>)) 

[00:43:12]  Adding  observation  of  T  at 
[CHAI6II0-VET  0  10000000000  [LL  [HOLE  2  1200]]] 

■  •  • 

Thara  ara  10  diagnoaaa  (entropy  3.172)  accounting  for  .06: 
0.108  [[(043  Other)]] 


Entropy  Signal;  Aliaaea;  Value- Environment  Paira 

0.7460  [CHAIO IIO-HHT  0  10000000000  [LL  [HOLE  1  1280]]] 
aka  [PII  16  Oil]  aka  [OUT  TC  U11A] 

((T  #<KIV  6  0307>)) 

[00:44:16]  Adding  obaervation  of  1  at 
[LL  [HOLE  1  1280]] 

Conflict!  Thara  ara  7  diagnoaaa  (entropy  2.837)  accounting  for  .Ob: 
0.100  [[(012  Other)]] 

Conflict!  Thara  ara  8  diagnoaaa  (entropy  2.462)  accounting  for  .06: 
0.210  [[(012  Other)]] 


Conflict!  Thara  ara  5  diagnoaaa  (entropy  2.302)  aocounting  for  .06: 
0.220  [[(012  Other)]] 
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I 


Thar*  are  6  diagnoses  (entropy  3.302)  accounting  for  .OS: 

0.220  [C<012  Other)]] 

Kefinimg  020  rith  CSB- INACTIVE 

e  e  e 

Conflict)  Ihoro  are  S  diagnoses  (entropy  2.202)  aceoonting  for  .OS: 
0.238  [[(012  Other)]] 

•  e  e 

There  ere  S  diagnoses  (entropy  2.303)  aeeonating  for  .OS: 

0.238  [[(012  Other)]] 

Decomposing  (t< ASSUMPTION  +INP  [STATUS-OF  012  U03KZV0]>) 

e  e  e 

There  are  6  diagnoses  (entropy  2.202)  aoconnting  for  .06: 

0.238  [[(012  Other)]] 

Decosposing  (#< ASSUMPTION  +IIP  [STiTOS-OP  010  V0UZI6]>) 

There  are  6  diagnoses  (entropy  2.202)  aeeotmting  for  .06: 

0.238  [[(012  Other)]] 

Decomposing  (#< ASSUMPTION  +XIP  [STiTOS-OP  Oil  W0U»8]>) 

e  e  e 

There  are  6  diagnoses  (entropy  2.202)  accounting  for  .OS: 

0.238  [[(012  Other)]] 

Decomposing  (#<ASSUMPTIOf  +XIP  [STiTOS-OP  020  «0UXI0]>) 

e  e  e 

There  are  6  diagnoses  (entropy  2.202)  accounting  for  .06: 

0.238  [[(012  Other)]] 


Entropy  Signal;  iliases;  Value- Environment  Pairs 

1.0000  [IX  [BOLE  I  1213]] 

aka  [PIE  16  010]  aka  [OUT  TC  0101] 

((0  8<KET  3  041 02>  #<E*T  3  020102>)) 

[00:48:66]  Adding  observation  of  T  at 
[CHAI6II8-VET  0  10000000000  [LL  [HOLE  1  1213]]] 

There  are  6  diagnoses  (entropy  2.203)  accounting  for  .06: 
0.230  [[(012  Other)]] 


Entropy  Signal;  Aliases;  Value-Environment  Pairs 

1.0000  [IX  [HOLE  2  168]] 

aka  [PX1  12  021]  aka  [XI  CLOCK  0211] 

((0  #<KN V  2  04100>  #<ENV  2  010100>  #<EEV  2  020100>)) 

[00:60:16]  Adding  observation  of  T  at 
[CHUOXIO-VET  0  10000000000  [LL  [HOLE  2  168]]] 
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Hurt  u*  6  dlaptui  (aatropr  2.393)  uesmtiai  for  .93: 
0.239  CC(B12  Other)]] 


Entropy  Signal;  Aliaaea;  Talue-Eariroanent  Paira 

0.7423  [CHAIOZIO-VET  0  10000000000  CU.  [H0LX  2  1230]]] 
•kn  CPU  14  044]  aka  OX  3  0441] 

((T  KUT  4  0340>)) 

[09:61:24]  Adding  obaorration  of  T  at 
[CHAIOXie-VBT  0  10000000000  [LX  [B0 LI  2  1236]]] 

•  •  • 

Thor*  aro  6  diagnoaoa  (ontropy  2.293)  accounting  for  .96: 
0.239  [[(012  Othor)]] 


Entropy  Signal;  Aliaaaa;  Talue-Enrironnent  Paira 

0.7423  [CBlfaZIO-VKT  0  10000000000  [IX  [HOLE  2  1117]]] 
aka  CPZH  6  044]  aka  [BZ  4  0441] 

((T  KBIT  4  0300>)) 

o  o  o 

[09:62:26]  Adding  obaorration  of  T  at 
[CH1E6ZE6-VET  0  10000000000  [LL  [BOLE  2  1117]]] 

Thoro  aro  6  diagnoaoa  (ontropy  2.293)  aeoonntiag  for  .96: 
0.239  CC(012  Othor)]] 


•  •  • 

Entropy  Signal;  Aliaaaa;  Value- Sarirounent  Paira 

0.7423  [CHAEOZIO-VET  0  10000000000  [LL  [HOLE  2  E208]]] 
aka  [PZI  24  U43]  aka  [ZI  7  0431] 

((T  KBIT  3  0320>)) 

[09:63:36]  Adding  obaorration  of  T  at 
[CHAI6ZI6-VHT  0  10000000000  [LL  [HOLE  2  1208]]] 


Thoro  aro  6  diagnoaoa  (ontropy  2.293)  aeoonntiag  for  .96: 
0.239  [[(U12  Othor)]] 


Entropy  Signal;  Aliaaoa;  Valuo- Environment  Paira 

0.7423  [CHABCZI6-UHT  0  10000000000  [LL  [HOLE  1  1289]]] 
aka  [PZI  16  021]  aka  [BZ  6  0211] 

((T  KBIT  3  0320>)) 

[09:64:40]  Adding  obaorration  of  T  at 
[CHAE6ZE0-VET  0  10000000000  [LL  [HOLE  1  1289]]] 

Thoro  aro  6  diagnoaoa  (ontropy  2.293)  accounting  for  .96: 
0.239  [[(012  Othor)]] 
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Kntropy  Signal;  Aliases;  V alua- K&v iroiment  Pairs 

0.7423  [CHAISIIC-VXT  0  10000000000  [LL  [SOLI  2  148]]] 
aka  CPXS  38  048]  aka  CXI  11  0431] 

((T  #<HT  8  0820>)) 

[00:68:43]  Adding  observation  of  T  at 
[CHAIGXI6-VAT  0  10000000000  [LL  [HOLE  3  148]]] 


There  aro  6  diagnoses  (entropy  3.298)  accounting  for  .96: 
0.289  [[(012  Other)]] 


Entropy  Signal;  Aliases;  Value- Invironnent  Pairs 

0.6808  [CHAI0XI8-MIT  0  10000000000  [LL  [HO LX  2  1228]]] 
aka  [PXI  11  030]  aka  [XI  A  0200] 

((MIL  #<m  3  06>)) 

[09:66:46]  Adding  observation  of  T  at 
[CHAiaiK-tnT  0  10000000000  [LL  [HOLE  2  1228]]] 

Conflict!  Thors  arc  2  diagnoses  (entropy  0.988)  accounting  for  .96: 
0.667  [[(Oil  Other)]] 

e  e  e 

There  aro  2  diagnoses  (entropy  0.088)  accounting  for  .96: 

0.667  [[(Oil  Other)]] 


e  e  e 

Entropy  Signal;  Aliases;  Value- Environaent  Pairs 

0.8630  [CHAiaXIO-VET  0  10000000000  [LL  [HOLE  8  1101]]] 
aka  [PXI  9  010}  aka  [XI  LOAD  010A] 

((■XL  #<HV  1  01>)) 

[09:67:40]  Adding  observation  of  T  at 
[CHAI6XI6-VHT  0  10000000000  [LL  [HOLE  8  1101]]] 

There  are  1  diagnoses  (entropy  0.010)  accounting  for  .96: 
0.998  [[(Oil  Other)]] 


Probes 

(Fourteen) 

1272 

1272 

1290 

■280 

1213 

■66 

1236 

1117 

■208 

1289 

148 

■223 


Diagnoses 
afterwards 
*  1 

*###8*8*89  10 
#•••###•##  10 
•####  6 
••###  6 
•«###  6 
#####  6 
#####  6 
#####  6 
•####  6 
#####  8 
##  2 
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APPENDIX  A.  SCENARIO  TRANSCRIPTS 


I 

A.  10  Input  Encoder  Example  I 


There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

i.ooo  cm 

[16: 19:20]  Adding  observation  ol  T  at 
[POWER  CXI  POWER  3370]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  cm 

[16:22:171  Adding  observation  ol  0  at 
[LL  [HOLS  2  1897] 

There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  [[]] 

[16:22:27]  Adding  observation  ol  1  at 
[LL  [HOLE  2  183]] 

*  •  e 

There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  cm 

[10:23:23]  Adding  observation  ol  0  at 
[LL  [BOLE  2  183]] 

see 

Thero  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

i.ooo  cm 

[16:26:04]  Adding  observation  ol  IXL  at 
CKS  CXI  PAD  U]] 

e  »  e 

Thero  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  cm 

•  •  • 

[16:26:06]  Adding  observation  ol  IXL  at 
[KS  [XI  RED  9]] 

Thero  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  [[]] 

[16:26:11]  Adding  observation  ol  IXL  at 
[KT  [OUT  RETS  CJ] 

Thero  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

1.000  cm 

[16:26:12]  Adding  observation  ol  T  at 
[CBAICII6-VRT  1000000000  10000000000  [XP  [II  XDX  0]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  lor  .96: 

i.ooo  cm 


A.10.  INPUT  ENCODER  EXAMPLE  I 
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[16:25:16]  Adding  observation  of  T  at 
[C8iiaXI6-VKT  1000000000  10000000000  CNF  [XV  KDT  U]]] 

•  *o 

Thera  are  1  diagnoses  (entropy  0.000)  accounting  for  .05: 

1.000  cm 

[16:26:19]  Adding  observation  of  IIL  at 
[CHAI6II6-VKT  1000000000  10000000000  [MP  [II  MB  0]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .96: 

1.000  [[]] 

[16:25:22]  Adding  observation  of  IXL  at 
[CHAiaXM-VKT  1000000000  10000000000  [HP  [OUT  KDX  C]]] 

There  are  18  diagnoses  (entropy  3.932)  accounting  for  .96: 
0.136  [[(U26  Other)]] 

iefiaing  U26  with  CPU 

There  are  10  diagnoses  (entropy  3.931)  accounting  for  .95: 
0.136  [[(U26  Open)]] 

Deconpesing  (#<ASSUXPTXOI  +IIT  [STATUS-OP  U26  V0BXII6]>) 

There  are  16  diagnoses  (entropy  3.931)  aeeounting  for  .96: 
0.136  [[(U26  Open)]] 

Deeonposing  ( #< ASSUKPTXOI  +IIP  [STATUS-OF  U33  V0RXXIG]>) 

e  e  e 

There  are  18  diagnoses  (entropy  3.931)  accounting  for  .95: 
0.135  [[(U25  Open)]] 

Deconposing  («<ASSUKFTXOI  +IIF  [STATUS-OF  US4  V0KKXI6]>) 

•  •  e 

There  are  18  diagnoses  (entropy  3.931)  accounting  for  .96: 
0.136  [[(026  Open)]] 

[16:31:28]  Adding  observation  of  IXL  at 
[CHAIGXIO-VtT  1000000000  10000000000  [MP  [OUT  MDY  C]]] 

There  are  18  diagnoses  (entropy  3.931)  accounting  for  .96: 
0.136  [[(U26  Open)]] 

[16:32:39]  Adding  observation  of  IIL  at 
[CHAI6IIQ-V&T  1000000000  10000000000  [MP  [OUT  KB  C]]] 

There  are  18  diagnoses  (entropy  3.931)  accounting  for  .96: 
0.136  [[(U26  Open)]] 


Bntropy  Signal;  Aliases;  Talus- Environment  Pairs 

0.9898  [CHAIOXIU-VRT  1000000000  10000000000  [LL  [BOLI  1  1178]]] 
aka  [PXI  36  U34]  aka  [BX  20  U34A] 


« 


6 
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((T  #<UT  0  0777>)  (XXL  #<m  14  01777036>)) 

[16:36:43]  Adding  obitmtion  of  1  tt 
[LL  [BO LI  1  1178]] 

Ttaoro  nro  0  diagnoses  (ant ropy  2.806)  accounting  for  .06: 
0.280  [[(026  Opon)]] 


Entropy  Signal;  Aliaaoa;  Taluo- Environment  Pairs 

1.0448  [CHAIGIIC-WT  1000000000  10000000000  [LL  [B0LI  1  1267]]] 
aka  [PXI  10  U7]  aka  [BZ  10  U?A] 

((T  8<unr  6  077>)  (IZL  #<EIY  6  02000073>)) 

[16:30:26]  Adding  observation  of  166260.0  at 
[m  640000  >(1  0)  [LL  [HO LX  1  1267]}] 

[16:40:20]  Adding  observation  of  166260.0  at 
[FW  640000  *(0  1)  [LL  [HO LX  1  1267]]] 

Thoro  aro  7  diagnoses  (entropy  2.616)  accounting  for  .06: 

0.332  [[(U34  Other)]] 

Decomposing  (#<ASSUKPTIOX  +IIF  [STATUS-OF  U30  W0XJCIIG]>) 

Thoro  aro  7  diagnoses  (entropy  2.616)  accounting  for  .06: 

0.332  [[(034  Other)]] 

Decomposing  («ASSOKFTIOB  +XIF  [STATUS-OF  014  WXXZV6]>) 

Thoro  are  7  diagnoses  (entropy  2.616)  accounting  for  .06: 

0.332  [[(034  Other)]] 

Decomposing  («<1SSUXPTI0I  +IIF  [STATOS-OF  032  V0IXZia]>) 

e  •  e 

Thoro  are  7  diagnoses  (entropy  2.616)  accounting  for  .06: 

0.332  [[(034  Other)]] 


Kntropy  Signal;  Aliases;  Talue-bvironaent  Pairs 

0.8870  [CHAV6IV6-VIT  1000000000  10000000000  [LL  [HOLS  3  1162]]] 
aka  [PXI  4  033]  aka  [XI  IKSET  D33A] 

((T  *<KIT  3  0302>)  (MIL  #<EVT  6  0437>  #<IXT  6  0473>)) 

[16:46:20]  Adding  observation  of  0  at 
[LL  [HO LX  3  1162]] 

[16:46:41]  Adding  observation  of  1  at 
[LL  [HO LX  3  1162]] 

There  are  6  diagnoses  (entropy  2.070)  accounting  for  .06: 

0.442  [[(034  Other)]] 


A.10.  INPUT  ENCODER  EXAMPLE  I 
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Entropy  Signal;  Alias**;  Value-Enviroaaent  Pairs 

0.4987  [CHAIGIIG-VET  1000000000  10000000000  [LL  [BOLE  1  1130]]] 
aka  [PII  6  032]  aka  [OUT  T  OS2C] 

((T  #<UT  4  036>  Kin  6  073>)  (VZL  «UT  4  02000031>)) 

[16:47:46]  Adding  observation  of  5000000.0  at 
[FW  20000  *(0  1)  [LL  [BOLE  1  1130]]] 


Thoro  ar*  6  diagnoses  (entropy  2.081)  accounting  for  .96: 
0.443  [[(034  Other)]] 


Entropy  Signal;  Aliases;  Talus- Environment  Pairs 

0.4724  [CBAVGXI6-HBT  1000000000  10000000000  [LL  [BOLE  2  1243]]] 
aka  [PII  3  033]  aka  [II  XTAL2  0331] 

((T  KBIT  4  036>  #<EIT  6  073>)) 

[16:49:27]  Adding  observation  of  6000000.0  at 
[FW  20000  *(1  0)  [LL  [BOLE  2  1243]]] 


Thar*  ar*  6  diagnoses  (entropy  2.081)  accounting  for  .95: 
0.443  [[(034  Other)]] 


Entropy  Signal;  Aliases;  Talue-Environaent  Pairs 

0.4724  [CBAIOIIQ-URT  1000000000  10000000000  [LL  [BOLE  1  184]]] 
aka  [PII  10  032]  aka  [00T  T  032E] 

((T  #<EIT  4  036>  *<EIT  6  073>)) 

•  •  * 

[16:61:01]  Adding  observation  of  6000000.0  at 
[FW  20000  >(0  1)  [LL  [BOLE  1  184]]] 


There  ar*  6  diagnoses  (entropy  2.081)  accounting  for  .96: 
0.443  [[(034  Other)]] 


Entropy  Signal;  Aliases;  Talue-Enviroxusent  Pairs 

0.4724  [CBAIOIIQ-VKT  1000000000  10000000000  [LL  [BOLE  4  173]]] 
aka  [PII  2  034]  aka  [II  XTAL1  034A] 

((T  #<EIT  4  036>  #<EIY  5  073>)) 

[16:62:33]  Adding  observation  of  6000000.0  at 
[FW  20000  ’(1  0)  [LL  [HOLE  4  173]]] 

There  ar*  5  diagnoses  (entropy  2.081)  accounting  for  .96: 

0.443  [[(034  Other)]] 


Entropy  Signal;  Aliases;  Value- Environnent  Pairs 

0.4724  [CHAIGII8-VET  1000000000  10000000000  [CC  C6MHZL]] 
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aka  CPIV  3  033]  aka  £XI  XTAL2  9331] 

((T  #<UT  4  03 B>  #<BTT  6  073>)) 

[16:64:08]  Adding  obaarvation  of  6000000.0  at 
[FW  30000  *(IZL  T)  [CC  C6KBZL]] 

Thara  azo  3  diagnoaaa  Cant ropy  1.321)  accounting  for  .96: 
0.624  [[(034  Othor)]] 


Entropy  Signal;  Aliaaoa;  Valua-Environaant  Paira 

0.2806  [CHAIGZIC-VBT  1000000000  10000000000  [LL  [BOLE  1  146]]] 
aka  [PIE  4  032]  aka  [OUT  T  032B] 

((T  #<EET  2  014>  #<KIY  3  031>)  (VZL  #<EIT  2  02000010)) 

[16:66:41]  Adding  obaarvation  of  1.0o7  at 
[FW  10000  '(0  1)  [LL  [BOLE  1  146]]] 

[16:66:62]  Adding  obaarvation  of  1.0a7  at 
[FW  10000  *(10)  [LL  [BOLE  1  146]]] 

Thara  ara  3  diagnoaaa  (antropy  1.321)  accounting  for  .96: 

0.624  [[(034  Othar)]] 


Entropy  Signal;  Aliaaoa;  Valua- Environment  Paira 

0.2799  [CHAI0II9-VKT  1000000000  10000000000  [CC  C6MBZB]] 
aka  [PIB  2  034]  aka  CZB  XTAL1  0341] 

((T  #<EET  1  010>)) 

a  a  a 

[16:67:08]  Adding  obaarvation  of  6000000.0  at 
[FW  20000  ’  (T  IZL)  [CC  C6RBZ8]] 


Thara  ara  2  diagnoaaa  (antropy  0.721)  accounting  for  .96: 
0.800  [[(034  Other)]] 


Entropy  Signal;  Aliaaoa;  Talua-Bnvironmant  Paira 

0.2670  [CHAVGZI6-VBT  1000000000  10000000000  [LL  [BOLE  6  167]]] 
aka  [PZI  9  026]  aka  [Zl  LOAD  0261] 

((IZL  *<EIT  1  02>) ) 

[16:68:21]  Adding  obaarvation  of  1  at 
[LL  [BOLE  6  167]] 

Thara  ara  1  diagnoaaa  (antropy  0.000)  accounting  for  .96: 

1.000  [[(034  Othar)]] 


A. 11.  INPUT  ENCODER  EXAMPLE  II 


A.  11  Input  Encoder  Example  II 


There  ere  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

l.ooo  cm 

[10:27:37]  Adding  observation  of  T  at 
[P0VXX  [XV  P0VEX  S370]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  cm 

[10:30:20]  Adding  observation  of  0  at 
[LL  [B0 LX  2  183]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1-000  [[]] 

[10:30:27]  Adding  observation  of  1  at 
[LL  [HO LX  2  183]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 

[10:31:02]  Adding  observation  of  0  at 
[LL  [HO LX  2  183]] 

•  •  • 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 

[10:33:20]  Adding  observation  of  XXL  at 
[KS  [XI  PAD  U]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .0E: 

1.000  [[]] 

[10:33:26]  Adding  observation  of  IXL  at 
[KS  [II  KBD  0]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 

[10:33:30]  Adding  observation  of  IXL  at 
[KT  [OUT  KITS  CJ] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 

[10:33:31]  Adding  observation  of  T  at 
[CHAIGTIC-VKT  1000000000  10000000000  [KP  [XI  KDX  0]]] 

There  are  1  diagnoses  (entropy  0.000)  accounting  for  .06: 

1.000  [[]] 
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I 

[10:33:34]  Adding  observation  ot  T  at 
[CHAIGZIG-VBT  1000000000  10000000000  [HP  [ZI  NOT  U]]] 

Thar*  ora  1  ditaoiti  (ant ropy  0.000)  accounting  for  .05: 

1.000  £[]] 

I 

'  [10:33:38]  Adding  obaarration  of  VZL  a* 

[CHAIGIIG-VHT  1000000000  10000000000  [HP  [ZI  MB  0]]] 

Thar*  ora  1  diagnosas  (ant ropy  0.000)  accounting  for  .06: 

i.ooo  cm 

[10:33:42]  Adding  obaarration  of  IZL  at 
|  [CHAIGZIG-VBT  1000000000  10000000000  [KP  [OUT  KDZ  C]]] 

Thar#  or*  18  diagnoaaa  (entropy  3.832)  accounting  for  .86: 
0.138  [[(U26  Other)]] 

Aa fining  U26  with  0P1I 

Thara  or*  18  diagnoaaa  (antropy  3.831)  aeconnting  for  .86: 
I  0.136  [[(U26  Open)]] 

Daconpoaing  (KASSUMPTIOI  +Z IP  [STATUS-OP  U26  W0HJCIIC]>) 

Thar*  ora  18  diagnoaaa  (antropy  3.831)  aeconnting  for  .85: 
0.136  CC(U26  Open)]] 

Daconpoaing  (BCASSUMPTZOI  +ZIP  [STATUS-OP  033  V0BXZIG]>) 

Thara  or*  18  diagnoaaa  (antropy  3.831)  accounting  Tor  .85: 
0.136  [[(U26  Open)]] 

Daconpoaing  (#< ASSUMPTION  +ZIP  [STATUS-OP  034  V0BXZIG]>) 

Thar*  ora  18  diagnoaaa  (antropy  3.831)  accounting  for  .85: 
0.136  [[(026  Opon)]] 

[10:40:28]  Adding  obaarration  of  IZL  at 
[CHAIGZIG-VBT  1000000000  10000000000  CMP  [OUT  KDT  C]]] 

Thara  or*  18  diagnoaaa  (antropy  3.831)  accounting  for  .85: 
0.136  [[(026  Opon)]] 

[10:41:28]  Adding  obaarration  of  IZL  at 
[CHAIGZIG-VBT  1000000000  10000000000  [MP  [OUT  MB  C]]] 

Thar*  or*  18  diagnoaaa  (antropy  3.831)  accounting  for  .86: 
0.136  [[(025  Opon)]] 


Entropy  Signal;  Aliaaoa;  Toluo-Bnrironaont  Pair* 

0.8888  [CHAIGZIG-VBT  1000000000  10000000000  [LL  [HOLE  2  1178]]] 
aka  [PZI  1  033]  aka  [ZI  TO  U33A] 


A.U.  INPUT  ENCODER  EXAMPLE  II 


((T  *<unr  9  0777>)  (IZL  *<IX7  14  0177703E>)) 
[10:44:28]  Adding  observation  of  T  at 

[chaigxig-vet  1000000000  10000000000  III  [hole  2  H78]]] 

There  aro  IS  diagnoaoa  (entropy  3.466)  accounting  for  .96: 
0.179  [[(026  Open)]] 


Entropy  Signal;  Aliases ;  Value-Environment  Pairs 

0.8188  [CHAIGIIG-VET  1000000000  10000000000  [LL  [HOLE  1  1267]]] 
aka  [Pll  10  EI7]  aka  [BX  10  KI7A] 

((T  »<SIV  6  077>)  (IXL  #<BEV  6  02000073>)) 

[10:46:41]  Adding  observation  of  166260.0  at 
[FW  640000  '(1  0)  [LL  [HOLE  1  1267]]] 

[10:47:11]  Adding  observation  of  166260.0  at 
[FW  640000  »(0  1)  [LL  [HOLE  1  1267]]] 

There  are  12  diagnoses  (entropy  3.404)  accounting  for  .96: 

0.166  [[(033  Other)]] 

Decomposing  (#<ASSOHPTX0H  +IIF  [STATUS-0F  016  V0RKIIC]>) 

There  are  12  diagnoses  (entropy  3.404)  accounting  for  .96: 

0.166  [[(033  Other)]] 


e  e  e 

Entropy  Signal;  Aliases;  Value-Environment  Pairs 

0.3894  [WET  1000000002  2062428803  [LL  [BOLE  1  181]]] 
aka  [PII  6  022]  aka  [XI  B  022B] 

((1  #<EIV  8  0100373>  #<EIV  8  0100337>)) 

[10:49:47]  Adding  observation  of  1  at 
[LL  [HOLE  1  181]] 

There  are  12  diagnoses  (entropy  3.404)  accounting  for  .96: 
0.166  [[(033  Other)]] 


Entropy  Signal;  Aliases;  Value-Environment  Pairs 

0.3894  [VET  1000000002  2062428803  [LL  [HOLE  4  1137]]] 
aka  [PII  10  034]  aka  [II  VB  034A] 

((1  #<EIV  8  0100373>  *<EIV  8  0100337>)) 

[10:61:39]  Adding  observation  of  1  at 
[LL  [HOLE  4  1137]] 

There  are  12  diagnoses  (entropy  3.404)  accounting  for  .96: 
0.166  [[(033  Other)]] 
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Kntropy  Signal;  Alias** ;  Value-Environment  Pairs 

0.3894  Chit  1000000002  2052428803  [ll  [hols  2  111]]] 

aka  [PII  22  016]  aka  [II  01  U16A] 

((1  #<EIV  8  0100373>  #<BIV  8  0100337») 

[10:63:28]  Adding  observation  of  1  a* 

[LL  [BO LI  2  Ill]] 

There  are  4  diagnoses  (entropy  1.770)  accounting  for  .96: 
0.488  [[(033  Other)]] 

Decomposing  (#<1SSOMPTXOI  +IIF  [STATOS-OF  030  V0KXII6]>) 

There  are  4  diagnoses  (entropy  1.770)  accounting  for  .96: 
0.488  [[(033  Other)]] 

Decomposing  (KASSOMPTIOI  +IIF  [STATOS-OF  032  V0UCZI6]>) 

There  are  4  diagnoses  (entropy  1.770)  accounting  for  .96: 
0.488  [[(033  Other)]] 


Kntropy  Signal;  Aliases;  Value- Environment  Pairs 

0.6267  [CBAIQII6-V&T  1000000000  10000000000  [LL  [BOLI  1  1130]]] 
aka  [PII  8  032]  aka  [00T  T  032C] 

((T  8<KIV  4  036>  KBIT  6  073>)  (IIL  #<KIV  4  02000031>)) 

[10:66:38]  Adding  observation  of  6000000.0  at 
[FUV  20000  '(0  1)  [LL  [BOLI  1  1130]]] 

*  *  * 

There  are  4  diagnoses  (entropy  1.771)  accounting  for  .96: 

0.490  [[(033  Other)]] 

1 

Kntropy  Signal;  Aliases;  Value-Bnvironment  Pairs 

0.4973  [CBAICIIG-HRT  1000000000  10000000000  [LL  [BOLI  i  173]]] 
aka  [PII  2  034]  aka  [II  ZTAL1  0341] 

((T  *<BIV  4  036>  «<BIT  6  073>)) 

[10:69:06]  Adding  observation  of  6000000.0  at 
[FVU  20000  '(1  0)  [LL  [BOLE  4  173]]] 

There  are  4  diagnoses  (entropy  1.771)  accounting  for  .96: 

0.490  [[(033  Other)]] 


Kntropy  Signal;  Aliases;  Value-Bnvironment  Pairs 

0.4973  [CHAI6II6-V&T  1000000000  10000000000  [CC  C6MBZL]] 
aka  [PII  3  033]  aka  [II  XTAL2  033A] 

((T  #<KIV  4  036>  #<EIV  6  073>)) 


A.ll.  INPUT  ENCODER  EXAMPLE  II 
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[11:00:39]  Adding  observation  of  6000000.0  at 
[FW  20000  '  (IIL  T)  [CC  CEHHZL]] 


Tharo  aro  1  diagnoses  (entropy  0.048)  accounting  for  .95: 
0.906  [[(033  Other)]] 


Appendix  B 


Abstractions  and  Behaviors 


The  following  definitions  are  discussed  in  Chapter  5  and  collected  here  al¬ 
phabetically.  Keep  in  mind  that  their  purpose  is  mental  hygiene,  not  exe¬ 
cution.  The  procedural  style  of  definition  has  advantages  of  expressiveness 
that  outweigh  its  disadvantages.  The  expressiveness  advantage  is  that  the 
language  is  simple  enough  that  it  is  possible  to  introduce  compound  defini¬ 
tions  through  composition  of  existing  definitions  and  surface  transformations 
of  the  results;  the  rh  behavior  (Appendix  C)  is  an  example  of  this  technique. 
The  disadvantages  that  such  transformations  would  be  difficult  to  automate 
and  intractable  in  general  is  a  long-term  concern,  but  not  an  overriding  one. 
Likewise,  the  inefficiency  of  the  procedures  defined  in  some  cases  is  not  a 
concern,  since  the  troubleshooting  program  does  not  use  these  procedures 
directly.  Finally,  there  is  no  compelling  reason  that  a  more  declarative  rep¬ 
resentation  could  not  have  been  used  -  but  the  same  tractability  problems 
would  still  arise:  using  a  temporal  logic  as  in  [Moszkowski82],  for  example, 
would  not  solve  the  problem  of  transformations  being  intractable. 

accumulated-bits 
(lambda  (S  V  D) 

(lambda  (time) 

(if  (S  time)  0 

(let  ( (previous 

( (accumulatad-bits  S  V  D)  (-  time  £)))) 

(if  (V  time) 

(+  (if  (aql  (D  time)  1)  1  0) 
previous) 
previous))))) 
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brightness  *■ 

(lambda  (R  Cl  C2  Kbd  Kpd  N) 

(lambda  (time) 

(let  ((the-state  ((c-state  R  Cl  C2  Kbd  Kpd  M)  time))) 
(if  (eql  'init  the-state)  128 
(max  0 
(min  255 

(+  (if  (eql  ’local  the-state) 

(/  (-  ((duration 

(key-is-pressed  'B  Kbd))  time) 

( (duration 

(key-is-pressed  ’D  Kbd))  time)) 
3msec)  0) 

((brightness  R  Cl  C2  Kbd  Kpd  M) 

(-  time 

( (duration 

(c-state  R  Cl  C2  Kbd  Kpd  M)) 
time)))))))))) 


changing- wrt  ** 

(lambda  (lb  ub  S) 

(lambda  (time) 

(and  (*  time  ub) 

(>  ( (count -sw  (-  ub  lb)  (change  S))  time)  0)))) 


count-wr 

(lambda  (n  S) 

(lambda  (time) 

(if  (<=  n  0)  0 

(+  (if  (S  time)  1  0) 

((count-ew  (-  n  6)  S)  (-  time  £)))))) 


cross  ** 

(lambda  (▼  S) 

(lambda  (time) 

(let  ((s0  (S  (-  time  6  6))) 

(s2  (S  time))) 

(or  (<  sO  v  s2)  (<  s2  v  sO))))) 


cycles-ww  ** 

(lambda  (n  1  S) 

( count -ww  n  (sequence  1  S))) 
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dt  ■■ 

(laabda  (S) 

(laabda  (tine) 

(let  ((*0  (S  (-  tine  $))) 
(•1  (S  tia*))) 

(/  (-  si  SO)  6)))) 


duration 
(laabda  (S) 

(lambda  (tint) 

(if  ((change  S)  tins)  S 

(♦  S  ((duration  S)  (-  tima  S )))))) 


fall  ** 

(laabda  (C) 

(lambda  (tima) 

(and  (*  0  (C  tima)) 

(-  1  (C  (-  tima  *)))))) 


fww  «» 

(laabda  (n  1  S) 

(laabda  (tiaa) 

(/  (cycles-wv  n  1  S)  n))) 


event  ■* 

(laabda  (froa  to  S) 

(laabda  (tiaa) 

(and  (equal  (S  time)  to) 

(not  (equal  (S  (-  time  6) )  to)) 

(or  (eql  froa  zany) 

(equal  froa  (S  (-  tiaa  6) )))))) 


gray-event  »* 

(lambda  (SO  SI) 

(laabda  (tine) 

(or  ((change  SO)  time)  ((change  SI)  time)))) 


kbd- events 

(keyboard-events  kbd- state) 
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kbd-state  ** 

(samp  (fall  kbd-rasat) 

( accumulat ad-bit ■ 

(fall  kbd-rasat) 

(rise  kbd-clk)  kbd-data)) 


keyboard- a vants  ■» 

(lambda  (S) 

(lambda  (tima) 

(if  ((stay  S)  tima)  nil 

(lat  ((previous  (S  (-  tima  S ))) 

(currant  (S  time))) 

(list 

(pos->key  (log  (logxor  previous  currant)  2)) 
(if  (<  previous  currant)  ’up  ’dovn)))))) 


kay-is-prassad  ** 

(lambda  (key  Kbd) 

(lambda  (tima) 

(if  (aql  (list  'up  key)  (Kbd  time))  t 

(if  (aql  (list  ’down  key)  (Kbd  time))  nil 

((kay-is-prassad  key  Kbd)  (-  tima  £)))))) 


mouse-dr  ■« 

(tsign 

(  count -w  lsac 

(gray-avant  mouse-left  mouse-right))) 


register  *« 

Tlambda  (C  D) 

(lambda  (tima) 

(if  ((fall  C)  tima)  (D  time) 

((register  C  D)  (-  time  £))))) 


samp  **  sampla-and-hold  ** 

(lambda  (V  S) 

(lambda  (tima) 

(if  (V  time)  (S  tima) 

((samp  V  S)  (-  tima  £))))) 
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I 

saquanca  ■■ 

(lambda  (1  S) 

(lambda  (tiaa) 

(or  (null  1) 

|  (if  ((stay  S)  tima) 

((saquanca  1  S)  (-  tima  8 )) 

(and  (aql  (car  (last  1))  (S  (-  tima  8))) 
((saquanca  (butlaat  1)  S) 

(-  tima  *))))))) 


sign  ■■ 

(lambda  (x)  (if  (<  x  0)  (if  (>  x  0)  *+  0))) 


synchronous-dalay  ■»  syn-dal  »■ 

(lambda  (n  V  S) 

(lambda  (tima) 

(if  (V  tima) 

(if  (*  n  0) 

(S  tima) 

((synchronous-dalay  (*  n  1)  V  S)  (-  tima  6) ) ) 
((synchronous-dalay  n  V  S)  (-  tima  j))))) 


syn-ragistar  ■■ 

(lambda  (V  S)  (synchronous-dalay  1  V  S)) 


lambda  (S) 

(lambda  (tima) 

(if  ((fall  S)  tima) 

(inrart  ((toggla  S)  (-  tima  8 ))) 
( (toggla  S)  (-  tima  j))))) 


tsign  ■* 

(lambda  (S) 

(lambda  (tima) 

(sign  (S  time)))) 


tvo-phass-dock 
(lambda  (phil  phi 2) 

(saquanca  *((0  0)  (1  0)  (0  0)  (0  1)) 

(lambda  (tima)  (list  (phil  tima)  (phi2  tima))))) 


Appendix  C 


Reset  Hold  Counter  Behavior 


Section  5.8.1  alluded  to  the  fact  that  the  temporally  abstract  behavior  for 
the  Reset  Hold  Counter  component  could  be  derived  from  the  behaviors  of 
its  subcomponents.  The  actual  transformations  are  given  here. 

Consider  the  behaviors  of  the  three  components  of  the  Reset  Hold  Counter 
(Figure  C.l).  The  behavior  of  the  inverter  is  tinvert,  the  behavior  of  the 
AND  gate  is  tand,  and  the  k-bit  counter’s  behavior  is  represented  by  counter 
(nthbit  is  an  auxiliary  function,  not  a  behavior). 

tand  *« 

(lambda  (X  Y) 

(lambda  (time) 

(if  (and  (eql  (X  time)  1)  (eql  (Y  time)  1))  1  0))) 


counter  ** 

(lambda  (k  R  C) 

(lambda  (time) 

(if  (eql  0  (R  time))  0 
(mod 

(+  (if  ((fall  C)  time)  1  0) 

((counter  k  R  C)  (-  time  £))) 
(expt  2  k))))) 


nthbit  ** 

(lambda  (i  n)  (load-byte  nil)) 
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Figure  C.l:  Reset  Hold  Counter 


The  behavior  of  the  connected  group  of  components  is  represented  by 
rh-state,  which  returns  a  signal  representing  the  state  of  the  Reset  Hold 
Counter.  For  the  most  part  it  is  simply  a  composition  of  the  counter,  tand, 
and  tinvart  behaviors  that  reflects  the  circuit  structure.  The  signal  argu¬ 
ment  to  tinvart  is  a  delayed  version  of  the  most  significant  bit  of  state  and 
prevents  rh-stato  from  being  circularly  defined;  the  delay  could  have  been 
introduced  anywhere  in  the  loop.  The  behavior  rh  is  then  the  behavior  of  the 
entire  aggregate  structure;  it  is  simply  the  most  significant  bit  of  the  state. 

rh 

(lambda  (R  C) 

(lambda  (time) 

(nthbit  13  ((rhstata  R  C)  time)))) 


rh- state  »* 

(lambda  (R  C) 

(counter  14  R 
(tand  C 
(tinvart 

(lambda  (time) 

(nthbit  13 

((rh-state  R  C)  (-  time  delta)))))))) 
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Defining  the  behavior  rh  and  its  underlying  behavior  rh-state  does  not 
simplify  anything,  it  merely  composes  the  several  behaviors  into  one. 

rh- state  ■■ 

(lambda  (R  C) 

(counter  14  R 
(tand  C 
(tinvert 

(lambda  (time) 

(nthbit  13 

((rh-state  R  C)  (-  time  delta)))))))) 

rh  ** 

(lambda  (R  C) 

(lambda  (time) 

(nthbit  13  ((rhstate  R  C)  time)))) 

The  following  transformations  simplify  rh-state’s  definition  so  :hat  it 
takes  on  values  from  0  to  213  instead  of  0  to  214: 

The  use  of  counter  is  removed  by  substitution: 


(lambda  (R  C) 

((lambda  (k  R  C) 

(lambda  (time) 

(if  (eql  0  (R  time))  0 
(mod 

(+  (if  ((fall  C)  time)  1  0) 

((counter  k  R  C)  (-  time  delta))) 
(expt  2  k))))) 

14  R 
(tand  C 
(tinvert 

(lambda  (time) 

(nthbit  13 

((rh-state  R  C)  (-  time  delta)))))))) 


Substitution  for  k,  R,  and  C  promotes  the  (eql  0  (R  time))  condition: 
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(lambda  (R  C) 

(lambda  (time) 

(if  (aql  0  (R  tima))  0 
((lambda  (CC) 

(mod 

(♦  (if  ((fall  CC)  tima)  i  0) 

((counter  14  R  CC)  (-  tima  delta))) 

(azpt  2  14))) 

(tand 

C  (tinvert 

(lambda  (tima) 

(nthbit 

13  ((rh-atata  R  C) 

(-  tima  delta)))))))))) 

The  term  (counter  14  R  CC)  is  equivalent  to  (rh-atata  R  C)  and  can 
be  substituted: 


(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  time))  0 
((lambda  (CC) 

(mod 

(♦  (if  ((fall  CC)  tima)  1  0) 

((rh-atata  R  C)  (-  tima  delta))) 
(azpt  2  14))) 

(tand 

C  (tinvert 

(lambda  (tima) 

(nthbit 

13  ((rh-atata  R  C) 

(-  tima  delta)))))))))) 


With  only  one  reference  to  CC  remaining,  it  can  be  substituted  for: 
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(lambda  (R  C) 

(lambda  (time) 

(if  (aql  0  (R  tima))  0 
(nod 

(♦  (if  ((fall 

(tand 

C  (tinvart 

(lambda  (tima) 

(nthbit 

13  ((rh-stata  R  C) 

(-  tima  delta))))))) 

tima)  1  0) 

((rh-stata  R  C)  (-  tima  dalta))) 

(azpt  2  14))))) 


W«  can  now  case  split  on  whether  the  term  (nthbit  13  . . . )  is  1  or  0: 


(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  tima))  0 
(if  (aql  0  (nthbit 

13  ((rh-stata  R  C)  (-  tima  delta)))) 

(mod 

(♦  (if  ((fall  C) 

tima)  1  0) 

((rh-stata  R  C)  (-  tima  delta))) 

(azpt  2  14)) 

(mod 

(+  (if  ((fall  (lambda  (tima)  0)) 
tima)  1  0) 

((rh-stata  R  C)  (-  tima  delta))) 

(azpt  2  14)))))) 


Simplifying  the  else-part  of  the  resulting  condition  yields: 
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(lambda  (R  C) 

(lambda  (time) 

(if  (aql  0  (R  tima))  0 
(if  (aql  0  (nthbit 

13  ((rh-stata  R  C)  (-  tima  delta)))) 

(mod 

(♦  (if  ((fall  C) 

tima)  1  0) 

((rh-atata  R  C)  (-  tima  dalta))) 

(azpt  2  14)) 

((rh-atata  R  C)  (-  tima  dalta)))))) 

The  condition  (aql  0  (nthbit  13  z)  )  can  be  expressed  in  an  alterna¬ 
tive  way  as  (<  z  (azpt  2  13)): 


(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  time))  0 
(if  (<  (azpt  2  13) 

((rh-atata  R  C)  (-  tima  dalta))) 
(mod 

(♦  (if  ((fall  C) 
tima)  1  0) 

((rh-atata  R  C)  (-  tima  dalta))) 
(azpt  2  14)) 

((rh-atata  R  C)  (-  tima  dalta)))))) 
This  allows  us  to  drop  the  mod  term  from  the  if-part: 


(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  tima))  0 
(if  (<  (azpt  2  13) 

((rh-atata  R  C)  (-  tima  delta))) 
(♦  (if  ((fall  C) 

tima)  1  0) 

((rh-atata  R  C)  (-  tiam  delta))) 
((rh-atata  R  C)  (-  tima  delta)))))) 


Finally,  the  conditional  can  be  formulated  as  a  min  expression: 
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(lambda  (R  C) 

(lambda  (tima) 

(if  (eql  0  (R  time))  0 
(min  (expt  2  13) 

(+  (if  ((fall  C)  time)  1  0) 

((rh-state  R  C)  (-  time  delta))))))) 

The  following  schema  says  that  moment-by-moment  conditional  counting 
of  ?Y  can  be  replaced  with  “jumps”  of  duration  ?n,  when  ?X  is  periodic  and 
?F  is  monotonic: 

?SELF  «« 

(lambda  (?X  ?Y) 

(lambda  (time) 

(if  (?X  time)  0 

(?F  (♦  (if  (?Y  time)  1  0) 

( (7SELF  ?X  ?Y)  (-  time  delta))))))) 


7SELF  ■« 

(lambda  (?X  ?Y) 

(lambda  (time) 

(if  (X?  time)  0 

(let  ((■  (count-w*  ?n  ?Y))) 

(?F  (if  (<  n  ((duration  ?X)  time)) 

(+  (I  time)  ((TSELF  ?X  ?Y)  (-  time  ?n))) 

(*  (1  time) 

(/  ((duration  ?X)  time)  ?n)))))))) 

When  this  transformation  is  applied  to  a  rewritten  definition  of  rh- state 
called  nev-rh-state,  the  following  results: 

rh-state  ** 

(lambda  (R  C) 

((lambda  (LR  FC) 

(lambda  (time) 

(if  (LR  time)  0 

((lambda  (z)  (min  (expt  2  13)  x)) 

(+  (if  (FC  time)  1  0; 

((new-rh-etate  LR  FC)  (-  time  delta))))))) 
(lambda  (time)  (eql  0  (R  time))) 

(fall  C))) 
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Which  becomes,  with  ?n  still  unbound: 

nev-rh-state  “ 

(lambda  (ft  C) 

((lambda  (LR  FC) 

(lambda  (tima) 

(if  (LR  time)  0 

(let  ((MFC  (count-**  ?n  FC))) 

((lambda  (x)  (min  (ezpt  2  13)  x)) 

(if  (<  n  ((duration  LR)  time)) 

(+  (MFC  time) 

((rh-state  LR  MFC)  (-  time  n))) 

(*  (MFC  time) 

(/  ((duration  LR)  time)  n) ))))))) 
(lambda  (time)  (eql  0  (R  time))) 

(fall  C))) 

We  can  make  further  use  of  the  assumption  that  C  is  periodic  by  using 
the  frequency  temporal  abstraction  to  describe  C,  and  expressing  rh-state 
in  terms  of  that  abstraction.  The  transformations  required  to  do  the  latter 
are  as  follows;  first  the  FC  argument  is  substituted  for  the  original  term 
(fall  C): 

rh-state  ■« 

(lambda  (R  C) 

((lambda  (LR) 

(lambda  (time) 

(if  (LR  time)  0 

(let  ((MFC  (count- w  ?n  (fall  C)))) 

((lambda  (x)  (min  (expt  2  13)  x)) 

(if  (<  n  ((duration  LR)  time)) 

(+  (MFC  time) 

((rh-state  LR  MFC)  (-  time  n))) 

(*  (MFC  time) 

(/  ((duration  LR)  tisie)  n) ))))))) 
(lambda  (time)  (eql  0  (R  time))))) 

The  term  (cycles-**  n  ’  (0  1)  C)  is  then  substituted  for  the  equivalent 
term  (count-**  n  (fall  C)): 


- 1 
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rh-stats  -■ 

(lambda  (R  C) 

((lambda  (LR) 

(lambda  (tima) 

(if  (LR  tims)  0 

(1st  ((RFC  (cyclss-vw  ?n  '(0  1)  C))) 
((lambda  (x)  (min  (sxpt  2  13)  x)) 

(if  (<  n  ((duration  LR)  time)) 

(+  (NFC  tims) 

((rh-stats  LR  NFC)  (-  tims  n))) 

(*  (NFC  tims) 

(/  ((duration  LR)  tima)  n) ))))))) 
(lainbda  (tims)  (sql  0  (R  tims))))) 


The  cyclss  abstraction  can  be  reformulated  in  terms  of  fsrw  as  follows: 


rh-stats  *■ 

(lambda  (R  C) 

((lambda  (LR) 

(lambda  (tims) 

(if  (LR  tims)  0 

(1st  ((NFC  (lambda  (tims) 

(*  n  ((fvw  ?n  ’(0  1)  C)  tims))))) 
((lambda  (x)  (min  (sxpt  2  13)  x)) 

(if  (<  n  ((duration  LR)  tims)) 

(+  (NFC  tims) 

((rh-stats  LR  NFC)  (-  tims  n))) 

(*  (NFC  tims) 

(/  ((duration  LR)  tims)  n) ))))))) 
(lambda  (tims)  (sql  0  (R  tims))))) 


Now  NFC  can  be  substituted  into  the  body: 

rh-stats  ■■ 

(lambda  (R  C) 

((lambda  (LR) 

(lambda  (tims) 

(if  (LR  tims)  0 

((lambda  (x)  (min  (sxpt  2  13)  x)) 

(if  (<  n  ((duration  LR)  tims)) 

(+  (*  n  ((f«v  ?n  '(0  1)  C)  tims)) 
((rh-stats  LR  NFC)  (-  tims  n))) 
(*  (*  n  ((fww  ?n  '(0  1)  C)  tims)) 

(/  ((duration  LR)  tims)  n))))))) 
(lambda  (tims)  (sql  0  (R  tims))))) 
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and  the  common  subexpression  promoted,  with  a  simplification  in  the 
else-part  of  the  if: 


rh- state  ■■ 

(lambda  (R  C) 

((lambda  (LR) 

(lambda  (tima) 

(if  (LR  tima)  0 

((lambda  (x)  (min  (axpt  2  13)  x)) 

(let  ((f  ((fww  ?n  ‘(0  1)  C)  tima))) 

(if  (<  n  ((duration  LR)  tima)) 

(+  (*  n  f) 

((rh-stata  LR  RFC)  (-  tima  n))) 
(•  f  ((duration  LR)  tima)))))))) 
(lambda  (tima)  (aql  0  (R  tima))))) 


Since  R  only  takes  on  the  values  0  and  1,  LR  can  be  removed: 


rh-stata  ** 

(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  tima))  0 

((lambda  (x)  (min  (axpt  2  13)  x)) 

(lat  ((f  ((fsw  ?n  '(0  1)  C)  tima))) 

(if  (<  n  ((duration  R)  tima)) 

(+  (*  n  f)  ((rh-stata  R  RFC)  (-  tima  n))) 
(*  f  ((duration  R)  tima)))))))) 


Finally,  the  assumption  that  C  is  periodic  can  be  used.  If  C  is  periodic, 
then  f  must  be  a  constant  and  ?n  is  infinite.  These  substitutions  yield: 

rh-stata  •* 

(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  tima))  0 

((lambda  (x)  (min  (axpt  2  13)  x)) 

(if  (<  infinity  ((duration  R)  tima)) 

(+  (•  infinity  f) 

((rh-stata  R  RFC)  (-  tima  infinity))) 

(*  i  ((duration  R)  tima))))))) 


A  final  transformation  removes  the  if  statement  since  its  condition  is 
always  nil,  and  substitutes  for  x  (the  latter  could  have  been  done  earlier): 


rh- state  ■■ 

(lambda  (R  C) 

(lambda  (tima) 

(if  (aql  0  (R  tima))  0 

(min  (axpt  2  13)  (*  i  ((duration  R)  tima)))))) 
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Section  5.8.2  alluded  to  the  derivation  of  the  temporally  abstract  behavior 
of  the  Audio  Counter;  this  derivation  is  presented  here. 

While  the  Reset  Hold  Counter’s  Reset  input  starts  the  counter  back  at 
0  whenever  asserted,  in  the  Audio  Counter  only  the  first  l-to-0  transition  of 
the  Start  signal  matters.  Eighteen  clock  cycles  must  pass  before  the  “start” 
state  can  be  reached  again:  while  counting,  it  is  insensitive  to  the  Start 
signal.  One  consequence  is  that  while  the  transformation  from  the  directly 
composed  behavior  of  the  Reset  Hold  Counter  to  a  simplified  behavior  was 
tedious  but  straightforward,  the  simplified  behavior  of  the  Audio  Counter  is 
not  much  of  an  improvement  over  the  composed  behavior,  and  seems  to  be 
derivable  only  by  expanding  the  behavior  to  an  eighteen-way  case  split  and 
then  collapsing  it. 

The  four- bit  counters  are  both  wired  to  load  “14”  when  the  Load  signal 
goes  low: 

k-bit-counter- with- synchronous -clear- state  ■« 

(lambda  (k  D  L  P  T  C) 

(lambda  (time) 

(lat  ((previous  ((self  L  P  T  C)  (-  time  5)))) 

(if  ((rise  C)  time) 

(if  (eql  0  (L  time))  (D  time) 

(mod  (expt  2  k) 

(+  (if  (and  (eql  1  (P  time)) 

(eql  1  (T  time))) 

1  0) 

previous))) 

previous)))) 
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four-bit-counter-*ith-synchronous-clear-state  »* 

(lambda  (L  P  T  C) 

(k-bit-counter-with- synchronous -clear- state 
4  (lambda  (tima)  14)  L  P  T  C)) 

The  composition  and  simplification  of  the  behaviors  of  those  two  counters 
results  in  the  following  similar  behavior: 

eight-bit-counter-vith- synchronous -clear- state  ** 

(lambda  (L  P  T  C) 

(k-bit-counter-with- synchronous-clear- state 
8  (lambda  (tima)  (-  64  18))  L  P  T  O) 

Including  the  feedback  signal  Msb  results  in  the  following  composed  defi¬ 
nition: 

eighteen-counter  ■* 

(lambda  (S  C) 

(nthbit  8  (rising-edge-eighteen-counter-state  S  C))) 


rising-edge-eighteen-counter-state  «* 

(lambda  (S  C) 

(lambda  (time) 

(let  ((L  (lambda  (time) 

(nthbit 

8  ((rising-edge-eighteen-counter-state  S  C) 

(-  time  *)))))> 

(eight -bit-counter-with-synchronous-clear- state 
(tinvert  (tnor  S  L)) 

L  (lambda  (time)  1)  C)))) 

Finally,  after  many  transformations  the  following  simplified  definition  re 
suits: 

rising-edge-eighteen-counter-state  ■** 

(lambda  (S  C) 

(lambda  (time) 

(let  ((previous 

((rising-edge-eighteen-counter-state  S  C) 

(-  time  6 )))) 

(if  ((rise  C)  time) 

(if  (eql  0  (S  time))  (-  64  18) 

(if  (eql  previous  0)  0 

(mod  (+  1  previous)  64))) 
previous)))) 
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Some  temporal  abstractions  that  applied  to  the  Reset  Hold  Counter  can 
be  applied  to  this  simplified  behavior;  however,  the  assumptions  on  which 
they  depend  are  violated  by  the  normal  usage  of  the  circuit  and  so  the  result¬ 
ing  temporally  abstract  behaviors  have  little  predictive  force.  For  example, 
while  the  signal  Ksb  is  a  constant  1,  the  Audio  Counter  forms  a  frequency 
divider  with  respect  to  the  Clock  input;  however,  the  clocks  come  in  bunts 
of  18  and  normally  the  Start  line  goes  low  at  least  once  per  burst  —  the 
“frequencies’*  are  thus  not  constant,  but  rather  are  defined  over  so  few  cy¬ 
cles  as  to  be  useless.  For  another  example,  the  “counting”  behavior  of  the 
Audio  Counter  can  be  captured  by  (*  (duration  S)  (fvv  n  1  (1  0)  C)) 
only  during  the  bunts  of  18  clock  cycles  and  hence  is  similarly  useless. 

The  behavior  as  shown  above  is  not  event-preserving  with  respect  to  S: 
any  number  of  events  could  happen  while  C  had  no  rising  edges,  and  in  that 
case  Nsb  would  not  change.  However,  the  Sampling  abstraction,  when  applied 
to  the  Start,  Load,  and  Hsb  signals  with  respect  to  the  temporally  abstract 
signal  (rise  Clock),  yields  the  following  slightly  modified  behavior  that  is 
event  preserving: 

rising-edge-eighteen-counter-state  ■« 

(lambda  (S  C) 

( (lambda  (SS) 

(lambda  (tine) 

(let  ((previous 

((rising-edge-eighteen-counter-state  S  C) 

(-  tine  S)J)) 

(if  ((rise  C)  tine) 

(if  (eql  0  (SS  tine))  (-  64  18) 

(if  (eql  previous  0)  0 

(nod  (+  1  previous)  64)) 
previous))))) 

(s amp  (rise  C)  S))) 

(lambda  (SS)  ...)  is  event-preserving,  to  the  extent  that  n  falling  edges 
on  (sanp  (rise  C)  S)  will  result  in  somewhere  between  and  n  falling 
edges  on  Hsb.  This  is  because  (sanp  (rise  C)  S)  can  only  change  at  the 
same  moments  that  C  rises.  Thus  the  number  of  falls  on  Ksb  (measured  with 
respect  to  rising  edges  of  Clock)  is  bounded  as  follows: 
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( ( count -ww 

n  (fall  (samp  (rise  Clock)  Start)))  time)  > 
( (count-ww 

n  (fall  (samp  (rise  Clock)  Msb)))  time)  > 

(floor 

((count-ww 

n  (fall  (samp  (rise  Clock)  Start)))  time) 

18) 


Appendix  E 

The  Switch  Level  Model 


The  lowest  level  of  circuit  description  in  BASEL  is  a  switch  level  model.  The 
primitive  elements  of  the  model  are  pins,  etches,  resistors,  switches,  and 
voltage-controlled  switches  (that  is,  transistors).  The  model  uses  voltages 
in  the  set  {0,1}  and  currents  in  the  set  {-,0,+}  with  0  meaning  “negli¬ 
gible.’’  This  models  the  steady-state  digital  behavior  of  simple  analog  de¬ 
ments.  The  digital  current  model  is  needed  because  circuit  boards  contain 
physical  switches,  jumpers,  and  resistors,  whose  behavior  cannot  be  modeled 
adequately  by  a  gate-level  digital  model. 


E.l  Pins  and  other  Connections 

Behaviorally,  the  simplest  elements  are  connections,  which  have  ports  at 
two  ends.  Working  connections  transmit  certain  signals  unchanged  horn 
one  port  to  the  other.  The  signals  thus  transmitted  are  called  ordinary 
signals;  voltage  is  the  most  primitive  such  signal,  and  most  abstractions 
of  it  including  logic-leval  are  ordinary  signals  as  well.  Using  a  demon 
facility  instead  of  rules,  each  signal  that  appears  at  one  end  of  a  connection 
can  result  in  that  signal  getting  equated  (via  tsame)  to  the  corresponding 
signal  at  the  other  end.  For  example,  as  long  as  the  connection  c  is  working, 
the  logic-level  signals  at  either  end  cany  the  same  value: 
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[conn  c  (out  0  a)  (in  0  b)] 

[status-of  c  working] 

Signal  (11  (out  0  •))  exists 
(11  (out  0  a) )  is  an  ordinary  signal 
— ► 

[teams  -oo  +oo  (11  (out  0  a))  (11  (in  Ob))] 

Pins  are  a  kind  of  connection,  and  they  transmit  ordinary  signals  in  this 
fashion. 

Just  as  there  are  logic-level  signals  denoted  (11  X)  and  representing  a 
function  from  time  to  {0,1},  there  are  qualitative- current-into  signals  de¬ 
noted  (qci  X).  Qualitative  currents  range  over  {- , 0,  +},  and  have  the  arith¬ 
metic  operations  qplus  and  qminus  with  their  usual  meanings.  Pins  obey  a 
qualitative  version  of  Kirchhoff’s  current  law  (KCL);  that  is,  the  sum  of  the 
currents  into  the  pin  must  be  0: 

If  [conn  (pin  ?n  ?chip)  ? source  ?sink] 
and  [status-of  (pin  ?n  ?chip)  working] 
and  [thru  ?1  ?u  (qci  ?sourcw)  ?i] 

Then  [thru  ?1  ?u  (qci  ?sink)  (qminus  ?i)] 

If  [conn  (pin  ?n  ?chip)  ?sourcs  ?sink] 
and  [status-of  (pin  ?n  ?chip)  working] 
and  [thru  ?1  ?u  (qci  ?sink)  ?i] 

Then  [thru  ?1  ?u  (qci  ?sourcs)  (qminus  ?i)] 

Etches  obey  similar  rules  as  pins,  although  they  can  have  any  number  of 
ports  denoted  (hola  1  . . . ) ,  (hole  2  . . . ) ,  and  so  forth.  The  number  of 
ports  on  an  etch  is  referred  to  as  its  “arity.”  To  transmit  ordinary  signals, 
n  rules  could  be  written  for  each  arity,  one  that  says  that  the  value  at  hole 

1  is  the  same  as  at  hole  2,  the  value  at  hole  2  is  the  same  as  at  hole  3, 

and  so  forth.  Since  some  etches  have  several  dozen  ports,  this  is  impractical 
and  inefficient.  Instead,  BASIL  defines  for  each  etch  a  distinguished  port  not 
corresponding  to  any  physical  boundary,  which  TINT  connects  to  each  hole 
by  a  binary  connection.  For  example,  suppose  etch  nll9  has  arity  3.  In 
addition  to  its  three  ports  (hole  1  nll9),  (hole  2  nll9),  and  (hole  3  nll9),  it 
has  a  port  LL119  to  which  all  three  are  connected. 
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Etches  also  have  qualitative  KCL  roles,  and  the  rules  for  an  etch  with  3 
holes  are  shown  below;  n-ary  etches  require  n  rules  of  this  form: 

If  [iaa  ?•  etch] 
and  [status-of  ?•  working] 
and  [thru  T12  ?u2  (qci  (hole  2  ?#))  ?i2] 

and  [thru  ?13  ?u3  (qci  (hole  3  ?e))  ?i3] 

and  (overlap  (?12  Tu2)  (?13  ?u3)> 

Then  [thru  (nax  T12  ?13)  (ain  ?u2  ?u3) 

(qci  (hole  1  ?e))  (qninus  (qplus  ?i2  Ti3))] 

If  [isa  ?e  etch] 
and  [status-of  ?e  working] 
and  [thru  ?11  ?ul  (qci  (hole  1  ?e>)  Til] 

and  [thru  ?13  Tu3  (qci  (hole  3  ?e))  Ti3] 

and  (overlap  (Til  Tu2)  (Til  Tu3)) 

Then  [thru  (aax  Til  ?13)  (ain  Tul  Tu3) 

(qci  (hole  2  ?e))  (qninus  (qplus  Til  Ti3))] 

H  [isa  Te  etch] 
and  [status-of  Te  working] 
and  [thru  Til  Tul  (qci  (hole  1  Te))  Til] 

and  [thru  T12  Tu2  (qci  (hole  2  Te))  Ti2] 

and  (overlap  (Til  Tu2)  (Til  Tu2)) 

Then  [thru  (sax  Til  T12)  (ain  Tul  Tu2) 

(qci  (hole  3  Te))  (qninus  (qplus  Til  Ti2))] 

E.2  Resistors 

Resistors  have  (i)  positive  resistance,  (ii)  two  ports  (bi  1  . . . )  and 
(bi  2  . . .),  and  (iii)  rules  enforcing  KCL.  The  node  of  a  resistor  is  normal 
if  the  resistor  is  working.  While  in  normal  mode  it  obeys  a  qualitative  version 
of  Ohm’s  law  embodied  as  two  rules.  First,  the  current  into  a  resistor  has 
the  same  sign  as  the  voltage  drop  across  it: 
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If  [isa  ?r  resistor] 
and  [thru  ?11  ?ul  (nods  ?r)  normal] 
and  [thru  ?12  ?u2  (11  (bi  i  ?r)>  ?vl] 
and  (overlap  (?11  ?ul)  (?12  ?u2)) 
and  [thru  ?13  ?u3  (11  (bi  2  ?r))  ?v2] 
and  (overlap  (?11  ?ul)  (?12  ?u2)  (?13  ?u3)> 

Then  [thru  (max  ?11  ?12  ?13)  (min  ?ul  ?u2  ?u3) 

(qci  (bi  1  ?r))  (sign  (-  ?vl  ?v2))] 

Second,  if  there  is  no  current  flowing  into  a  resistor  then  there  is  no 
voltage  drop  across  it;  that  is,  the  logic-levels  at  both  ends  are  the  same: 

If  [isa  ?r  resistor] 
and  [thru  ?11  ?ul  (mode  ?r)  normal] 
and  [thru  ?12  ?u2  (qci  (bi  ?n  ?r))  0] 
and  (overlap  (?11  ?ul)  (?12  ?u2)> 

Then  [tsame  (max  ?11  ?12)  (min  ?ul  Tu2) 

(11  (bi  1  ?r))  (11  (bi  2  ?r))] 

The  second  rule  could  be  generalized;  nonzero  current  flowing  into  a  re¬ 
sistor  implies  that  there  must  be  a  voltage  drop  across  it.  In  the  implemen¬ 
tation,  however,  every  resistor  in  the  Console  Controller  Board  has  one  end 
connected  to  Vdd,  so  that  the  two  rules  above  were  sufficient  and  the  more 
general  version  was  never  needed. 


E.3  Switches 

Switches  appear  on  circuit  boards  in  various  guises;  as  jumpers,  buttons,  or 
as  literal  switches  whose  position  the  user  sets.  An  ordinary  switch  has  two 
ports  (bi  1  . . . )  and  (bi  2  . . . ) ,  and  two  modes,  open  and  shut.  In 
these  two  modes  it  either  has  infinite  or  negligible  resistance,  respectively. 
There  are  three  rules  describing  the  behavior  of  switches.  First,  if  a  switch 
is  open  then  all  the  currents  into  it  are  0: 

If  [isa  ?s  switch] 
and  [thru  ?1  ?u  (mode  ?s)  open] 

Then  [thru  ?1  ?u  (qci  (bi  1  ?s))  0] 
and  [thru  ?1  ?u  (qci  (bi  2  ?s))  0] 
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Second,  if  a  twitch  it  thnt  then  there  it  no  voltage  drop  acrott  it;  the 
logic-levdi  at  itt  port*  are  the  tame: 

If  [1m  ?•  twitch] 
and  [thru  ?1  ?u  (aod«  ?•)  shut] 

Then  [tsaaa  ?1  ?u  (11  (bi  1  Ts))  (11  (bi  2  ?•))] 

Third,  if  a  twitch  it  that  it  obeys  KCL;  that  it,  if  the  current  into  one 
port  it  known  then  the  cnrrent  into  the  other  it  itt  negative: 

If  [isa  ?»  twitch] 
and  [thru  Til  Tul  (soda  ?•)  noraal] 
and  [thru  T12  Tu2  (qci  (bi  ?n  ?■))  ?i] 
and  (overlap  (Til  Tul)  (T12  Tu2)> 

Then  [thru  (aaz  Til  T12)  (tun  Tul  Tu2) 

(qci  (bi  (-  3  Tn)  Ta))  (qninus  Ti)] 

A  typical  circuit  atructure  encountered  on  digital  boardt  it  a  combination 
of  a  twitch  to  ground  and  a  reaittor  to  a  constant  high  voltage  (Figure  E.l). 
When  the  twitch  it  open,  the  logic-level  of  node  V  goee  to  1;  when  thut  it  it 
0  and  current  flow*  out  of  the  resiator  through  the  twitch  to  ground. 

Since  the  resiator  and  twitch  typically  belong  to  different  field-replaceable 
unita,  it  it  important  in  a  troubleahooting  context  for  TINT  to  be  able  to 
model  at  this  level  of  detail. 

This  level  of  detail  would  alto  be  useful  for  proper  handling  of  failures 
such  as  solder  bridges  and  other  kinds  of  "shorts.”  Although  handling  of 
shorts  is  not  implemented,  some  of  the  necessary  behavior  models  are  in  fact 
included  in  TINT  and  to  are  presented  here  for  completeness. 

Transistors  are  modeled  as  voltage-controlled  switches.  Their  rules  are 
similar  to  those  for  twitches,  except  that  the  logic-level  at  their  g  port  de¬ 
termines  whether  they  are  opan  or  shut: 

If  [iaa  Tx  transistor] 
and  [status -of  Tx  working] 
and  [thru  Tl  Tu  (11  (in  g  Tx))  Tv] 

Then  [thru  Tl  Tu  (soda  Tx) 

(if  (aql  Tv  0)  'opan  ’shut)] 
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Figure  E.l:  Typical  Switch- Resistor  Combination 


The  resistor  and  transistor  models  can  be  composed  to  form  behavior 
models  of  ordinary  digital  components  such  as  logic  gates.  The  advantage  of 
this  level  of  detail  is  that  the  effects  of  faults  that  cause  shorts  between  signals 
(other  than  power  signals)  can  be  correctly  modeled.  Using  the  standard 
digital  model,  for  example,  the  logic-level  output  of  a  working  TTL  inverter 
must  be  1  if  the  input  is  0,  and  if  it  is  not,  then  the  inverter  must  be  broken. 
By  taking  currents  into  account,  the  more  accurate  prediction  can  be  made 
that  if  the  input  logic-level  is  0,  the  output  current  is  0.  Hence,  if  the  output 
logic-level  is  0  instead  of  1,  it  is  not  a  necessary  logical  consequence  that  the 
inverter  is  broken;  something  else  could  be  pulling  the  output  node  down. 

Using  the  switch  model  the  behavior  of  a  TTL  inverter  can  be  summarized 
as  follows:  if  the  input  current  is  0  the  output  voltage  will  be  0;  if  the  input 
voltage  is  0  the  output  current  will  be  0.  Figure  E.2  shows  how  the  qualitative 
models  of  resistors  and  switches  described  above  can  be  organized  so  as  to 
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reproduce  this  behavior. 

Figure  £.2:  TTL  Inverter  as  Modeled  with  TINT 


•  If  (11  Input)  is  0,  then  the  difference  between  (11  (bi  1  R))  and 
(11  (bi  2  R))  is  1,  so  (qci  2  R)  is  Hence,  current  is  flowing  out 
of  the  resistor  and  back  towards  the  gate  driving  this  one.  The  switch 
is  abut  and  so  (11  (bi  IT))  has  the  same  value  as  (11  (bi  2  T)), 
hence  (11  Output)  is  0. 

•  If  (qci  (bi  2  R))  is  0,  then  (11  (bi  2  R))  must  be  pulled  up 
to  1,  the  same  as  (11  (bi  1  R)).  This  makes  the  switch  open,  so 
(qci  (bi  2  T))  is  0,  and  the  gate  being  driven  will  make  (11  Output) 
be  1. 

Similarly,  Figure  E.3  shows  the  model  of  an  nMOS  inverter  in  TINT.  In 
nMOS,  the  current  normally  flowing  into  the  device  from  the  input  is  0  and 
so  likewise  for  the  current  into  the  output. 

•  If  (11  Input)  is  1  then  the  switch  is  shut,  so  (11  (bi  2  T))  has  the 
same  value  as  (11  (bi  1  T)),  hence  (11  Output)  is  0. 
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Figure  E.3:  nMOS  Inverter  as  Modeled  with  TINT 


•  If  (11  Input)  is  0  then  the  switch  is  open,  so  (qci  (bi  2  T) )  is  0. 

Hence  (qci  (bi  2  R))  is  0  and  hence  (11  (bi  2  R»  must  be  1. 

Similar  models  apply  to  NAND  and  NOR  gates  in  both  technologies.  A 
tristate  driver  in  either  technology  can  be  described  as  a  nMOS  inverter  with 
a  transistor  interposed  between  the  pullup  resistor  and  the  output  node. 

The  disadvantage  of  this  level  of  detail  is  that  while  the  the  digital  model 
allows  the  behavior  of  a  given  group  of  boolean  gates  to  be  easily  predicted 
using  straightforward  local  propagation,  this  cannot  be  done  in  general  in  the 
switch  model.  At  every  signal  fanout  in  TTL  or  wired-OR  in  nMOS,  local 
propagation  stalls,  and  the  solutions  to  that  problem  all  have  unfortunate 
side  effects.  This  is  a  standard  problem  with  local  propagation  schemes;  what 
is  different  about  this  case  is  that  it  is  guaranteed  to  be  ubiquitous  at  the 
switch  level  of  detail. 
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For  example,  the  rules  shown  so  far  cannot  deduce  that  the  node  X  in 
Figure  E.4  must  have  logic-level  1,  nor  that  the  currents  into  the  resistors 
must  be  0.  This  is  because  either  one  of  those  facts  must  be  known  before 
the  other  can  be  deduced;  this  is  termed  an  impasse. 

Figure  E.4:  Impasse  Example 


LL  -  1 


(qci  Y)  -  0  X  (qci  Z)  -  0 

One  solution  is  to  enumerate  the  possible  values  of  logic-level  at  X  (there 
are  only  two)  in  hopes  that  all  but  one  can  be  ruled  out.  In  this  case,  0 
is  inconsistent  because  it  would  require  the  sum  of  currents  into  X  to  be 
positive.  Thus,  the  logic-level  must  be  1.  This  is  a  terrible  solution  in 
general,  because  it  can  lead  to  combinational  explosion  among  choices  made 
for  different  quantities  over  different  time  intervals.  TINT  does  not  use  this 
solution. 

A  second  solution  is  to  recognize  that  R1  and  R2  are  in  parallel,  and  since 
their  resistances  are  positive  then  the  resistance  between  the  high  voltage 
and  X  is  positive  too.  In  effect,  there  is  just  a  single  resistor  between  the  two 
nodes  —  a  slice  [Sussman77]  [Sussman80].  In  BASIL  terminology,  there  is 
a  functional  component  including  Rl,  R2,  and  etch  X,  and  its  behavior  rule 
recognizes  the  above  situation  and  just  assigns  the  logic-level  1  to  X.  TINT 
uses  this  solution  in  the  Console  Controller  Board  examples  where  there 
happen  to  be  two  or  more  resistor  components  pulling  up  a  single  circuit 
node. 
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A  third  solution  is  to  rely  on  the  intended  direction  of  signal  flow  between 
the  components  and  assume  that  no  fault  will  cause  that  to  be  violated.  This 
is  a  way  of  using  the  switch  model  for  just  those  components  that  really  need 
it  (resistors  and  switches)  while  retaining  the  simpler  unidirectional  digital 
model  for  everything  else.  It  is  the  solution  that  TINT  uses  everywhere  that 
the  intended  signal  flow  is  unidirectional.  Shorting  faults  will  be  misdiag¬ 
nosed  as  multiple  faults  among  the  shorted  components,  since  the  effect  of 
shorts  is  to  cause  current  to  go  places  where  it  was  not  intended  to  go.  In 
TTL  it  is  usually  the  case  that  each  node  is  driven  by  only  one  component, 
and  if  that  component  does  not  hold  the  node  to  logic-level  0,  some  other 
component  pulls  it  up  to  logic-level  1.  The  component  driving  the  node  can 
simply  be  modeled  as  if  it  pulls  the  node  to  1  itself.  Thus  the  signal  flow 
appears  to  be  unidirectional.  The  behavior  of  TTL  components  with  respect 
to  qualitative  currents  is  thus  approximated  using  the  following  rules.  First, 
if  there  is  no  current  into  the  input  of  a  TTL  component  then  the  node  is 
pulled  up  to  1: 

If1  ?x  is  a  TTL  component 
and  [thru  ?11  ?ul  (mode  ?x)  normal] 
and  [thru  ?12  ?u2  (qci  (in  ?input  ?x))  0] 
and  ?input  is  not  either  PVR  or  GBD 
and  (overlap  (?11  ?ul)  (?12  7u2)> 

Then  [thru  (max  ?11  712)  (min  ?ul  7u2) 

(11  (in  ?input  ?x))  1] 

Second,  it  has  been  assumed  that  if  a  component  is  not  pulling  its  output 
node  down,  then  it  will  be  pulled  up  to  1;  hence  if  there  is  no  current  flowing 
through  a  pin  intended  to  be  a  TTL  output,  then  the  logic-level  at  the  node 
is  1: 


If  ?x  is  a  TTL  component 

and  [conn  (pin  ?i  ?c)  (hole  ?m  ?e)  (out  ?o  ?x)] 
and  [thru  71  ?u  (qci  (hole  ?m  ?•))  0] 

Then  [thru  71  ?u  (11  (hole  ?m  ?e))  1] 
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I 

These  latter  two  rules  are  used  for  all  the  TTL  components  in  the  imple¬ 
mented  model  of  the  Console  Controller  Board;  the  board  has  a  two  CMOS 
chips  and  for  the  time  being  they  are  modeled  as  if  they  were  TTL  as  well. 

|  Where  resistors  appear  in  wired-or  structures  and  as  pullups  for  buttons  and 

switches,  the  digital  current  model  is  used,  and  since  it  results  in  deductions 
being  made  about  logic-levels  it  meshes  smoothly  with  the  standard  digital 
model. 

I 


I 

I. 
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